cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.87k stars 3.77k forks source link

sql, kv: FOR UPDATE NOWAIT can starve or block under serializable #129145

Closed michae2 closed 3 days ago

michae2 commented 4 weeks ago

Consider a SFU query like the following, using v24.2.0-rc.1:

CREATE TABLE xy (x INT PRIMARY KEY, y INT);
INSERT INTO xy VALUES (1, 1), (2, 2);

-- lock the x=1 row in the first session
BEGIN;
SELECT * FROM xy WHERE x = 1 FOR UPDATE;

-- try to lock the y=2 row in the second session
SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;

I think in this case, ideally the second SFU query would be able to lock the second row without blocking. But under serializable isolation, this currently does not happen. With default behavior (locking below the filter) the plan tries to lock all rows in xy, thus failing the NOWAIT check because of the lock on x=1:

demo@localhost:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • filter
  │ filter: y = 2
  │
  └── • scan
        missing stats
        table: xy@xy_pkey
        spans: FULL SCAN
        locking strength: for update
        locking wait policy: nowait

  index recommendations: 1
  1. type: index creation
     SQL command: CREATE INDEX ON defaultdb.public.xy (y);
(16 rows)

Time: 5ms total (execution 4ms / network 1ms)

demo@localhost:26257/demoapp/defaultdb> SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
ERROR: could not obtain lock on row (x)=(1) in xy@xy_pkey
SQLSTATE: 55P03

With the new optimizer_use_lock_op_for_serializable behavior (locking above the filter) the plan only tries to lock matching rows, but blocks when trying to read the x=1 row due to serializable isolation:

demo@localhost:26257/demoapp/defaultdb> SET optimizer_use_lock_op_for_serializable = on;
SET

Time: 1ms total (execution 1ms / network 0ms)

demo@localhost:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: xy@xy_pkey
  │ equality: (x) = (x)
  │ equality cols are key
  │ locking strength: for update
  │ locking wait policy: nowait
  │
  └── • filter
      │ filter: y = 2
      │
      └── • scan
            missing stats
            table: xy@xy_pkey
            spans: FULL SCAN

  index recommendations: 1
  1. type: index creation
     SQL command: CREATE INDEX ON defaultdb.public.xy (y);
(21 rows)

Time: 3ms total (execution 3ms / network 0ms)

demo@localhost:26257/demoapp/defaultdb> SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
(blocks)

It seems like neither behavior is desirable for NOWAIT.

Jira issue: CRDB-41396

Epic CRDB-40199

blathers-crl[bot] commented 4 weeks ago

Hi @michae2, please add branch-* labels to identify which branch(es) this C-bug affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

michae2 commented 4 weeks ago

Under read committed isolation this works fine, because the initial scan of xy can read underneath the lock:

demo@localhost:26257/demoapp/defaultdb> BEGIN ISOLATION LEVEL READ COMMITTED;
BEGIN

Time: 0ms total (execution 1ms / network 0ms)

demo@localhost:26257/demoapp/defaultdb  OPEN> EXPLAIN SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: xy@xy_pkey
  │ equality: (x) = (x)
  │ equality cols are key
  │ locking strength: for update
  │ locking wait policy: nowait
  │ locking durability: guaranteed
  │
  └── • filter
      │ filter: y = 2
      │
      └── • scan
            missing stats
            table: xy@xy_pkey
            spans: FULL SCAN

  index recommendations: 1
  1. type: index creation
     SQL command: CREATE INDEX ON defaultdb.public.xy (y);
(22 rows)

Time: 5ms total (execution 4ms / network 1ms)

demo@localhost:26257/demoapp/defaultdb  OPEN> SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
  x | y
----+----
  2 | 2
(1 row)

Time: 4ms total (execution 4ms / network 1ms)
michae2 commented 3 weeks ago

For SKIP LOCKED we now flag the initial unlocked scans as skipped locked, as of e74202ce65f5c454ee9bcfb0971b38f637955c3a. But this strategy won't work for NOWAIT, because we might skip over a lock that should cause the "could not obtain lock" error.

I think the only fix is to read underneath the lock in the initial scan, like we do for read committed. That's why I've tagged this as T-KV.

michae2 commented 1 week ago

After talking with @nvanbenschoten and @mw5h we think the initial unlocked read should be flagged with NOWAIT under serializable.