cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.9k stars 3.78k forks source link

opt: add rule to decorrelate EXISTS (SELECT ... FOR UPDATE) subquery into semi join under read committed #114712

Open michae2 opened 10 months ago

michae2 commented 10 months ago

When using read committed isolation (or when optimizer_use_lock_op_for_serializable is enabled) we use the new Lock operator implementation of SELECT FOR UPDATE. This Lock operator acts as an optimization barrier, which we need to prevent the locking side effect from being optimized away.

But there are some basic optimizations that we should bring back for the Lock operator. One such optimization is decorrelation of a correlated EXISTS subquery into a semi join. Consider the following query:

CREATE TABLE ab (a INT PRIMARY KEY, b INT);
CREATE TABLE c (c INT PRIMARY KEY);
SELECT * FROM ab WHERE a = 1 AND EXISTS (SELECT * FROM c WHERE c = b);

For a normal SELECT this query becomes a simple semi-join:

demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM ab WHERE a = 1 AND EXISTS (SELECT * FROM c WHERE c = b);
           info
---------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: c@c_pkey
  │ equality: (b) = (c)
  │ equality cols are key
  │
  └── • scan
        missing stats
        table: ab@ab_pkey
        spans: [/1 - /1]
(12 rows)

Time: 2ms total (execution 2ms / network 0ms)

And same for the old implementation of SELECT FOR UPDATE, but for the new implementation it stays an apply-join:

demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM ab WHERE a = 1 AND EXISTS (SELECT * FROM c WHERE c = b FOR UPDATE);
               info
----------------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: c@c_pkey
  │ equality: (b) = (c)
  │ equality cols are key
  │ locking strength: for update
  │
  └── • scan
        missing stats
        table: ab@ab_pkey
        spans: [/1 - /1]
(13 rows)

Time: 2ms total (execution 2ms / network 0ms)

demo@127.0.0.1:26257/demoapp/defaultdb> SET optimizer_use_lock_op_for_serializable = true;
SET

Time: 1ms total (execution 0ms / network 0ms)

demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM ab WHERE a = 1 AND EXISTS (SELECT * FROM c WHERE c = b FOR UPDATE);
           info
---------------------------
  distribution: local
  vectorized: true

  • apply join (semi)
  │
  └── • scan
        missing stats
        table: ab@ab_pkey
        spans: [/1 - /1]
(9 rows)

Time: 1ms total (execution 1ms / network 0ms)

Jira issue: CRDB-33647 Epic CRDB-38938

mgartner commented 6 months ago

@michae2 Is this still planned for 24.1?