cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.8k stars 3.76k forks source link

sql/opt: prove partial index implication with in-exact atoms containing virtual computed columns #122352

Closed cjireland closed 4 months ago

cjireland commented 4 months ago

Describe the problem

Please describe the issue you observed, and any steps we can take to reproduce it:

--! The following works as expected
CREATE TABLE test_index_stored (
    a int primary key,
    b jsonb
);

ALTER TABLE test_index_stored ADD COLUMN c STRING AS (b->>'c') STORED;
ALTER TABLE test_index_stored ADD COLUMN d STRING AS (b->>'d') STORED;
ALTER TABLE test_index_stored ADD COLUMN e INT AS ((b->>'e')::INT) STORED;
CREATE INDEX ON test_index_stored(c, e) WHERE c IS NOT NULL AND e IS NOT NULL;

INSERT INTO test_index_stored VALUES (1, '{"c": "foo", "d": "bar", "e": 5}');

SELECT c, e FROM test_index_stored@test_index_stored_c_e_idx WHERE c = 'foo' AND e > 4;  --! works
--! The following fails:
CREATE TABLE test_index_virtual (
    a int primary key,
    b jsonb
);

ALTER TABLE test_index_virtual ADD COLUMN c STRING AS (b->>'c') VIRTUAL;
ALTER TABLE test_index_virtual ADD COLUMN d STRING AS (b->>'d') VIRTUAL;
ALTER TABLE test_index_virtual ADD COLUMN e INT AS ((b->>'e')::INT) VIRTUAL;
CREATE INDEX ON test_index_virtual(c, e) WHERE c IS NOT NULL AND e IS NOT NULL;

INSERT INTO test_index_virtual VALUES (1, '{"c": "foo", "d": "bar", "e": 5}');

SELECT c, e FROM test_index_virtual@test_index_virtual_c_e_idx WHERE c = 'foo' AND e > 4; --! fails
SELECT c, e FROM test_index_virtual@test_index_virtual_c_e_idx WHERE c is not null and e is not null; --! works
SELECT c, e FROM test_index_virtual@test_index_virtual_c_e_idx WHERE c is not null and e is not null and c = 'foo' and e > 4; --! fails

Spent quite some time investigating the problem, even tried to force the index scan. The exception was thrown here: https://github.com/cockroachdb/cockroach/blob/67547d7724f8a52646e2e8ecb3ca48b923957d90/pkg/sql/opt/exec/execbuilder/relational.go#L598, when the index that is actually needed mismatches the index in the flag.

Changed the code to force the scan the flags.index, the query worked, but empty result was returned, likely due to predicate matching. It seemed to me that the predict added here can be further pruned: https://github.com/cockroachdb/cockroach/blob/67547d7724f8a52646e2e8ecb3ca48b923957d90/pkg/sql/opt/optbuilder/select.go#L643 and it might also have something to do with virtual columns, because they are always applied a projection

Expected behavior The query should not fail.

Environment:

Jira issue: CRDB-37825

michae2 commented 4 months ago

I wonder if this is at all related to https://github.com/cockroachdb/cockroach/issues/112978?

mgartner commented 4 months ago

Here's a smaller reproduction, which I believe has the same underlying cause:

CREATE TABLE t (
  a JSON,
  b INT AS ((a->>'c')::INT) VIRTUAL,
  INDEX i (b) WHERE b IS NOT NULL
);

SELECT * FROM t@i WHERE b > 4;
-- ERROR: index "i" is a partial index that does not contain all the rows needed to execute this query
-- SQLSTATE: 42809

The problem is that the partial index implicator cannot prove that (a->>'c')::INT > 4 implies (a->>'c')::INT IS NOT NULL. We use constraints to prove implication of expressions that are not identical. A constraint can only constrain columns, and it's impossible to create a constraint for a given (a->>'c')::INT > 4.

I think to fix this we'll have to make the partial index implicator aware of virtual computed columns so that it can build constraints on those columns, instead of trying to bulid constraints on the columns referenced in the computed column expression. So in this case we'd build the constraint /b: [/4 - ] for (a->>'c')::INT > 4 and /b: [/NULL - ] for (a->>'c')::INT IS NOT NULL, making it trivial for the implicator to see that the former is contained by the latter, and thus the former implies the latter.

cty123 commented 4 months ago

During the opt phase, I checked that b.addPartialIndexPredicatesForTable(tabMeta, outScope.expr) was able to correctly add the partial index predicate and it's aware of the fact that the a is-not null. So my expectation was that, during the opt phase, the engine would simplify the query further to narrow down the constraints, but that wasn't the case.

mgartner commented 4 months ago

... it's aware of the fact that the a is-not null.

The a being the non-virtual column, correct? a being NOT NULL does not ensure that the expression a->>'c' is NOT NULL.

I'm fairly confident that b.addPartialIndexPredicatesForTable is not at fault, and improving the implicator logic here will be the correct solution.

cty123 commented 4 months ago

Sorry, I mixed up the column name here. I was referring to the column b in your example table t.

During the execution, I indeed see the predicate being added correctly, and it's aware that the column b in the index can't be null. It looks like the later optimization or the transformation should incorporate that information.

mgartner commented 4 months ago

When attempting the produce the partial index scan, the query plan being operated on looks like:

project
 ├── columns: a:1 b:2
 ├── immutable
 ├── select
 │    ├── columns: a:1
 │    ├── immutable
 │    ├── scan t
 │    │    ├── columns: a:1
 │    │    └── partial index predicates
 │    │         └── i: filters
 │    │              └── (a:1->>'c')::INT8 IS NOT NULL [outer=(1), immutable]
 │    └── filters
 │         └── (a:1->>'c')::INT8 > 4 [outer=(1), immutable]
 └── projections
      └── (a:1->>'c')::INT8 [as=b:2, outer=(1), immutable]

Notice that the filter b > 4 has been converted to (a:1->>'c')::INT8 > 4 and pushed below the projection of b. This is necessary in order to generate a constrained scan on any type of secondary index with a virtual computed column.

However, no filter constraint is generated for (a:1->>'c')::INT8 > 4, nor for (a:1->>'c')::INT8 IS NOT NULL because it is impossible to do so with respect to a. To prove that (a:1->>'c')::INT8 > 4 implies (a:1->>'c')::INT8 IS NOT NULL, we need the implicator to understand that the expression (a:1->>'c')::INT8 represents the virtual column b and we can temporarily generate a filter constraint w.r.t. that column to prove implication.