cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.87k stars 3.77k forks source link

opt: plan lookup join when prefix columns are computed and held constant #127910

Open marcrasi opened 1 month ago

marcrasi commented 1 month ago

Describe the problem

I'm having a problem where I have an expression index and I want to do a lookup join using that index, but Cockroach doesn't seem able to use the expression component of the index in the lookup join equality.

I was able to work around this by putting the expression in a stored column, so this isn't super urgent for me, but I thought it would be helpful for me to file an issue anyways.

I searched around for similar issues and I thought maybe this might be related to https://github.com/cockroachdb/cockroach/issues/123320, because that's also a case where Cockroach isn't using an expression component of an index that it should be able to use.

To Reproduce

Here's a reduced case that reproduces the problem. I did some queries with stored column indices for comparison -- those work as expected. I wrote "UNEXPECTED" over the queries with expression indices that don't do what I'd expect.

CREATE TABLE example_workflow_eval (
    run_id UUID NOT NULL,
    PRIMARY KEY (run_id)
);

CREATE TABLE example_run_log_entry (
    run_id UUID NOT NULL,
    run_log_entry_id UUID NOT NULL,
    event JSONB NOT NULL,
    event_type STRING AS (event->>'event_type') STORED,
    PRIMARY KEY (run_id, run_log_entry_id)
);

CREATE INDEX example_index_event_type ON example_run_log_entry (event_type);
CREATE INDEX example_index_event_type_jsonb ON example_run_log_entry ((event->>'event_type'));

-- Using the stored column index, I can do a lookup join.
EXPLAIN
SELECT example_workflow_eval.run_id
FROM example_workflow_eval
LEFT LOOKUP JOIN example_run_log_entry@example_index_event_type ON (
    example_run_log_entry.run_id = example_workflow_eval.run_id AND
    example_run_log_entry.event_type = 'done'
);

--   • lookup join (left outer)
--   │ estimated row count: 1
--   │ table: example_run_log_entry@example_index_event_type
--   │ equality: (lookup_join_const_col_@7, run_id) = (event_type,run_id)
--   │
--   └── • render
--       │
--       └── • scan
--             estimated row count: 1 (100% of the table; stats collected 8 minutes ago)
--             table: example_workflow_eval@example_workflow_eval_pkey
--             spans: FULL SCAN

-- UNEXPECTED: When everything is the same, except using an expression instead of a stored column, I can't do a lookup join.
EXPLAIN
SELECT example_workflow_eval.run_id
FROM example_workflow_eval
LEFT LOOKUP JOIN example_run_log_entry@example_index_event_type_jsonb ON (
    example_run_log_entry.run_id = example_workflow_eval.run_id AND
    example_run_log_entry.event->>'event_type' = 'done'
);

-- ERROR: could not produce a query plan conforming to the LOOKUP JOIN hint

-- A similar problem happens with a slightly different index.

CREATE INDEX example_index_run_event_type ON example_run_log_entry (run_id, event_type);
CREATE INDEX example_index_run_event_type_jsonb ON example_run_log_entry (run_id, (event->>'event_type'));

-- Using the stored column index, I can do a lookup join.
EXPLAIN
SELECT example_workflow_eval.run_id
FROM example_workflow_eval
LEFT LOOKUP JOIN example_run_log_entry@example_index_run_event_type ON (
    example_run_log_entry.run_id = example_workflow_eval.run_id AND
    example_run_log_entry.event_type = 'done'
);

--   • lookup join (left outer)
--   │ estimated row count: 1
--   │ table: example_run_log_entry@example_index_run_event_type
--   │ equality: (run_id, lookup_join_const_col_@7) = (run_id,event_type)
--   │
--   └── • render
--       │
--       └── • scan
--             estimated row count: 1 (100% of the table; stats collected 11 minutes ago)
--             table: example_workflow_eval@example_workflow_eval_pkey
--             spans: FULL SCAN

-- UNEXPECTED: When everything is the same, except using an expression instead of a stored column, I can't do a lookup join.
EXPLAIN
SELECT example_workflow_eval.run_id
FROM example_workflow_eval
LEFT LOOKUP JOIN example_run_log_entry@example_index_run_event_type_jsonb ON (
    example_run_log_entry.run_id = example_workflow_eval.run_id AND
    example_run_log_entry.event->>'event_type' = 'done'
);

--   • lookup join (left outer)
--   │ estimated row count: 1
--   │ table: example_run_log_entry@example_run_log_entry_pkey
--   │ equality: (run_id, run_log_entry_id) = (run_id,run_log_entry_id)
--   │ equality cols are key
--   │ pred: (event->>'event_type') = 'done'
--   │
--   └── • lookup join (left outer)
--       │ estimated row count: 1
--       │ table: example_run_log_entry@example_index_run_event_type_jsonb
--       │ equality: (run_id) = (run_id)
--       │
--       └── • scan
--             estimated row count: 1 (100% of the table; stats collected 11 minutes ago)
--             table: example_workflow_eval@example_workflow_eval_pkey
--             spans: FULL SCAN

-- It looks like it succeeded, but it didn't respect my intention to do a single lookup join with
-- the index I specified.
-- Instead, it does a lookup join with the part of the index I specified, and then a "filter" using
-- the predicate on a second lookup join.

Expected behavior The queries using the expression index should have worked the same as the queries using the stored column index.

Environment: I'm running cockroach start-single-node on my macbook, and connecting to it with cockroach sql.

marcrasi@Marcs-MacBook-Pro alda % cockroach --version
cockroach version details:
Build Tag:        v23.1.2
Build Time:       2023/05/25 16:20:05
Distribution:     CCL
Platform:         darwin arm64 (aarch64-apple-darwin21.2)
Go Version:       go1.19.4
C Compiler:       Clang 10.0.0
Build Commit ID:  810d4f27a7f02b9cc2750cab654ed1c62ac3e75a
Build Type:       release
(use 'cockroach version --build-tag' to display only the build tag)

Jira issue: CRDB-40667

blathers-crl[bot] commented 1 month ago

Hi @marcrasi, please add branch-* labels to identify which branch(es) this C-bug affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] commented 1 month ago

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

If we have not gotten back to your issue within a few business days, you can try the following:

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

mgartner commented 1 month ago

Thanks for the report! I have a tentative draft to address this in #128329.