Open rytaft opened 2 years ago
manually reviewed and updated
One suggestion for users that run into this issue is that index hints are supported for mutation statements. This makes it possible to control the index used during the initial row scan of a mutation, which can be important to minimize false contention in some cases.
For example, without a hint:
demo@127.0.0.1:26257/movr> EXPLAIN UPDATE tab SET a = 2000 WHERE b = 1;
info
------------------------------------------------------------------------------------------------
distribution: local
vectorized: true
• update
│ table: tab
│ set: a
│ auto commit
│
└── • render
│
└── • filter
│ estimated row count: 500
│ filter: b = 1
│
└── • scan
estimated row count: 1,000 (100% of the table; stats collected 36 seconds ago)
table: tab@tab_pkey
spans: FULL SCAN
With a hint:
demo@127.0.0.1:26257/movr> EXPLAIN UPDATE tab@tab_b_idx SET a = 2000 WHERE b = 1;
info
---------------------------------------------------------------------------------------------
distribution: local
vectorized: true
• update
│ table: tab
│ set: a
│ auto commit
│
└── • render
│
└── • index join
│ estimated row count: 500
│ table: tab@tab_pkey
│ locking strength: for update
│
└── • scan
estimated row count: 500 (50% of the table; stats collected 39 seconds ago)
table: tab@tab_b_idx
spans: [/1 - /1]
locking strength: for update
@nvanbenschoten Do you know the reason for not propagating locking strength: for update
to the full table scan in your first example above? We're considering using the locking strength of scans to penalize the cost of full table scans and scans with uncertain cardinality.
We don't push down the implicit row-level locking mode if the initial row scan of an UPDATE statement is not precise (e.g. contains a filter). This is meant to avoid false contention, though I think you could argue that it's an unprincipled rule that falls out of the fact that we don't support shared locks.
Currently, CockroachDB's optimizer tries to find the best plan for a query in isolation and doesn't concern itself with the possibility that reading more rows than needed could cause write conflicts. Consider the following example:
The optimizer chooses a full scan of the primary index instead of using the index on
b
, since in isolation, this is less expensive than performing an index join with half the rows in the table. However, it means that any other query that wants to update rows whereb = 2
will block, even though technically there is no conflict.Ideally, the optimizer would almost never choose a full scan for mutation queries such as this one where a suitable index is available, due to the impact on contention.
Epic CRDB-37836
Jira issue: CRDB-14977
gz#11932
gz#18109