cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.11k stars 3.81k forks source link

opt: the optimizer cost model should consider contention #79683

Open rytaft opened 2 years ago

rytaft commented 2 years ago

Currently, CockroachDB's optimizer tries to find the best plan for a query in isolation and doesn't concern itself with the possibility that reading more rows than needed could cause write conflicts. Consider the following example:

CREATE TABLE tab (
  a INT,
  b INT,
  INDEX (b)
);
INSERT INTO tab SELECT generate_series(1,500), 1;
INSERT INTO tab SELECT generate_series(501,1000), 2;
ANALYZE tab;
EXPLAIN UPDATE tab SET a = 2000 WHERE b = 1;
                                             info
-----------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • update
  │ table: tab
  │ set: a
  │ auto commit
  │
  └── • render
      │ estimated row count: 500
      │
      └── • filter
          │ estimated row count: 500
          │ filter: b = 1
          │
          └── • scan
                estimated row count: 1,000 (100% of the table; stats collected 2 seconds ago)
                table: tab@tab_pkey
                spans: FULL SCAN

The optimizer chooses a full scan of the primary index instead of using the index on b, since in isolation, this is less expensive than performing an index join with half the rows in the table. However, it means that any other query that wants to update rows where b = 2 will block, even though technically there is no conflict.

Ideally, the optimizer would almost never choose a full scan for mutation queries such as this one where a suitable index is available, due to the impact on contention.

Epic CRDB-37836

Jira issue: CRDB-14977

gz#11932

gz#18109

kenliu-crl commented 2 years ago

manually reviewed and updated

nvanbenschoten commented 2 years ago

One suggestion for users that run into this issue is that index hints are supported for mutation statements. This makes it possible to control the index used during the initial row scan of a mutation, which can be important to minimize false contention in some cases.

For example, without a hint:

demo@127.0.0.1:26257/movr> EXPLAIN UPDATE tab SET a = 2000 WHERE b = 1;
                                              info
------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • update
  │ table: tab
  │ set: a
  │ auto commit
  │
  └── • render
      │
      └── • filter
          │ estimated row count: 500
          │ filter: b = 1
          │
          └── • scan
                estimated row count: 1,000 (100% of the table; stats collected 36 seconds ago)
                table: tab@tab_pkey
                spans: FULL SCAN

With a hint:


demo@127.0.0.1:26257/movr> EXPLAIN UPDATE tab@tab_b_idx SET a = 2000 WHERE b = 1;
                                            info
---------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • update
  │ table: tab
  │ set: a
  │ auto commit
  │
  └── • render
      │
      └── • index join
          │ estimated row count: 500
          │ table: tab@tab_pkey
          │ locking strength: for update
          │
          └── • scan
                estimated row count: 500 (50% of the table; stats collected 39 seconds ago)
                table: tab@tab_b_idx
                spans: [/1 - /1]
                locking strength: for update
mgartner commented 2 years ago

@nvanbenschoten Do you know the reason for not propagating locking strength: for update to the full table scan in your first example above? We're considering using the locking strength of scans to penalize the cost of full table scans and scans with uncertain cardinality.

nvanbenschoten commented 2 years ago

We don't push down the implicit row-level locking mode if the initial row scan of an UPDATE statement is not precise (e.g. contains a filter). This is meant to avoid false contention, though I think you could argue that it's an unprincipled rule that falls out of the fact that we don't support shared locks.