cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.16k stars 3.82k forks source link

opt: not able to plan lookup join under filter with subquery #126776

Open michae2 opened 4 months ago

michae2 commented 4 months ago

In the example below, we avoid a full scan of cd by using a lookup join into cd@primary:

CREATE TABLE abc (
  a INT PRIMARY KEY,
  b STRING,
  c INT,
  INDEX (b)
);
CREATE TABLE cd (
  c INT PRIMARY KEY,
  d INT[]
);
EXPLAIN SELECT * FROM abc JOIN cd USING (c) WHERE b LIKE 'foo%' AND d @> ARRAY[123];
demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM abc JOIN cd USING (c) WHERE b LIKE 'foo%' AND d @> ARRAY[123];
                 info
--------------------------------------
  distribution: local
  vectorized: true

  • lookup join
  │ table: cd@cd_pkey
  │ equality: (c) = (c)
  │ equality cols are key
  │ pred: d @> ARRAY[123]
  │
  └── • index join
      │ table: abc@abc_pkey
      │
      └── • scan
            missing stats
            table: abc@abc_b_idx
            spans: [/'foo' - /'fop')
(16 rows)

But if we change the d @> ARRAY[123] predicate to use a correlated subquery over a set-generating function such as 123 = ANY (SELECT unnest(d)) then it's no longer possible to plan a lookup join, and instead we get a full scan of cd:

demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM abc INNER LOOKUP JOIN cd USING (c) WHERE b LIKE 'foo%' AND 123 = ANY (SELECT unnest(d));
ERROR: could not produce a query plan conforming to the LOOKUP JOIN hint
demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM abc JOIN cd USING (c) WHERE b LIKE 'foo%' AND 123 = ANY (SELECT unnest(d));
                                              info
------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • hash join
  │ estimated row count: 1
  │ equality: (c) = (c)
  │ right cols are key
  │
  ├── • index join
  │   │ estimated row count: 1
  │   │ table: abc@abc_pkey
  │   │
  │   └── • scan
  │         estimated row count: 1 (100% of the table; stats collected 54 seconds ago)
  │         table: abc@abc_b_idx
  │         spans: [/'foo' - /'fop')
  │
  └── • group (hash)
      │ estimated row count: 1
      │ group by: c
      │
      └── • filter
          │ estimated row count: 10
          │ filter: unnest = 123
          │
          └── • project set
              │ estimated row count: 10
              │
              └── • scan
                    estimated row count: 1 (100% of the table; stats collected 55 seconds ago)
                    table: cd@cd_pkey
                    spans: FULL SCAN
(32 rows)

Jira issue: CRDB-40119

michae2 commented 4 months ago

Here's an opttest that shows the problem:

exec-ddl
CREATE TABLE abc (
  a INT PRIMARY KEY,
  b STRING,
  c INT,
  INDEX (b)
)
----

exec-ddl
CREATE TABLE cd (
  c INT PRIMARY KEY,
  d INT[]
)
----

optstepsweb
SELECT * FROM abc JOIN cd USING (c) WHERE b LIKE 'foo%' AND d @> ARRAY[123]
----

optstepsweb
SELECT * FROM abc JOIN cd USING (c) WHERE b LIKE 'foo%' AND 123 = ANY (SELECT unnest(d))
----

Looking at the optstepsweb output, I wonder if we should be less aggressive about pushing predicates below joins if those predicates contain subqueries? Maybe predicates containing subqueries should only be pushed down during exploration?