Closed DerZc closed 1 year ago
Hi @DerZc. Since your shift depends on the value of the row IDs, the result can indeed differ in the presence of an index. Without the index, the implicit cast to integer during the UPDATE
, which does not change the values in c0
, does not perform an actual update, so the row IDs stay the same during that statement. In the presence of the index, we do not notice that the update does not cause any changes, so we perform an INSERT
of the new values (increasing the row IDs) and then a DELETE
of the old values internally. So when shifting, we shift differently depending on the scenario.
I hope I understood your issue correctly and that this helps. Let me know if I missed something. :)
Without index.
CREATE TABLE t0(c0 INTEGER);
INSERT INTO t0(c0) VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1);
SELECT t0.rowid FROM t0;
┌───────┐
│ rowid │
│ int64 │
├───────┤
│ 0 │
│ 1 │
...
D UPDATE t0 SET c0=(0.1);
D SELECT t0.rowid FROM t0;
┌───────┐
│ rowid │
│ int64 │
├───────┤
│ 0 │
│ 1 │
...
With index.
CREATE TABLE t0(c0 INTEGER);
INSERT INTO t0(c0) VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1);
CREATE INDEX i3 ON t0(c0);
SELECT t0.rowid FROM t0;
┌───────┐
│ rowid │
│ int64 │
├───────┤
│ 0 │
│ 1 │
...
D UPDATE t0 SET c0=(0.1);
D SELECT t0.rowid FROM t0;
┌───────┐
│ rowid │
│ int64 │
├───────┤
│ 16 │
│ 17 │
...
Hi @taniabogatsch , thank you for your reply!
There are two SELECT
queries in this program, and the t0.rowid
should be unchanged in these two queries.
The first query return 32
, so the inner query in the second should also return 32. So the second query is equal to SELECT (MIN(t0.rowid)) AS c0 FROM t0 WHERE ((t0.c0)>>(32));
as there are only 0
and -1
in t0.c0
and both of 0>>32
and -1>>0
equal to 0
. because for this query SELECT t0.c0, t0.c0>>32 FROM t0;
I get this results:
┌───────┬───────────────┐
│ c0 │ (t0.c0 >> 32) │
│ int32 │ int32 │
├───────┼───────────────┤
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ -1 │ 0 │
├───────┴───────────────┤
│ 17 rows 2 columns │
└───────────────────────┘
so the condition of the second query always return 0
, which equal to FALSE
. So the second query should has empty results.
I run this query SELECT (MIN(t0.rowid)) AS c0 FROM t0 WHERE ((t0.c0)>>(32));
and get the result:
┌───────┐
│ c0 │
│ int64 │
├───────┤
│ │
└───────┘
But the current second query return 32
.
Ah, I got you now. I will look into this.
CREATE TABLE t0(c0 INTEGER);
INSERT INTO t0(c0) VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1);
CREATE INDEX i3 ON t0(c0);
UPDATE t0 SET c0=(0.1);
INSERT INTO t0(c0) VALUES (-1);
SELECT (MIN(t0.rowid)) AS c0 FROM t0 WHERE t0.c0;
┌───────┐
│ c0 │
│ int64 │
├───────┤
│ 32 │
└───────┘
SELECT (MIN(t0.rowid)) AS c0 FROM t0 WHERE ((t0.c0)>>32);
┌───────┐
│ c0 │
│ int64 │
├───────┤
│ │
└───────┘
# The expected output should be' NULL' when copy-pasting the above query into this query (as a subquery).
SELECT (MIN(t0.rowid)) AS c0 FROM t0 WHERE ((t0.c0)>>(SELECT (MIN(t0.rowid)) AS c0 FROM t0 WHERE t0.c0));
┌───────┐
│ c0 │
│ int64 │
├───────┤
│ 32 │
└───────┘
Just putting some progress here, and providing a smaller reproducible example. This seems unrelated to the existence of the index. Instead, implicit conversions from INTEGER
to BIGINT
'invalidate' the hard-coded 32-bit shift result.
┌─────────────┴─────────────┐
│ FILTER │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│CAST((CAST(col AS BIGINT) >│
│ > SUBQUERY) AS BOOLEAN) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 0 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 1 │
│ (0.00s) │
└─────────────┬─────────────┘
# here, our col is of type INTEGER
statement ok
CREATE TABLE tbl(col INTEGER);
statement ok
INSERT INTO tbl(col) VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0);
statement ok
INSERT INTO tbl(col) VALUES (-1);
query I
SELECT tbl.rowid FROM tbl WHERE tbl.col;
----
32
query I
SELECT col FROM tbl WHERE tbl.col >> 32;
----
# rowid is of type BIGINT, so we cast tbl.col to BIGINT, too (we cast to the 'bigger' type)
# this 'incorrectly' outputs -1 instead of zero rows, because we now shift a BIGINT by 32 bits
query I
SELECT col FROM tbl WHERE tbl.col >> (SELECT tbl2.rowid FROM tbl AS tbl2 WHERE tbl2.col);
----
# an explicit cast fixes this
query I
SELECT col FROM tbl WHERE tbl.col >> (SELECT tbl2.rowid FROM tbl AS tbl2 WHERE tbl2.col)::INTEGER;
----
Accordingly, we can add a typecast to the initial example.
statement ok
CREATE TABLE t0(c0 INTEGER);
statement ok
INSERT INTO t0(c0) VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1);
statement ok
CREATE INDEX i3 ON t0(c0);
statement ok
UPDATE t0 SET c0=(0.1);
statement ok
INSERT INTO t0(c0) VALUES (-1);
query I
SELECT (MIN(t0.rowid)) AS c0 FROM t0 WHERE t0.c0;
----
32
# casting the subquery to INTEGER avoids implicitly converting t0.c0 to BIGINT, resulting in the expected shift result
query I
SELECT (MIN(t0.rowid)) AS c0 FROM t0 WHERE ((t0.c0)>>(SELECT (MIN(t0.rowid)) AS c0 FROM t0 WHERE t0.c0)::INTEGER);
----
NULL
I got it, thank you for your patience in explaining!
What happens?
Consider the following program
there are only two values in
t0.c0
:0
and-1
, both of0>>32
and-1>>32
equal to 0, so the secondSELECT
should has empty results. if I remove theCREATE INDEX
statement, this query can have correct result.To Reproduce
build the last version of source code, save this program in
test.sql
, then runduckdb < test.sql
OS:
ubuntu 22.04
DuckDB Version:
commit version b8cf6a98
DuckDB Client:
CLI
Full Name:
Chi Zhang
Affiliation:
Nanjing University, National University of Singapore
Have you tried this on the latest
master
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?