Open knz opened 4 years ago
@knz or @andreimatei, can either of you please provide a realistic example of when this deadlock would occur and a recommended workaround?
Actually, I think I can use the example from one of the linked PRs. Does this look ok?
ROLLBACK TO SAVEPOINT
in high-priority transactions containing DDLTransactions with priority HIGH
that contain DDL and ROLLBACK TO SAVEPOINT
are not supported, as they could result in a deadlock. For example:
> BEGIN PRIORITY HIGH; SAVEPOINT s; CREATE TABLE t(x INT); ROLLBACK TO SAVEPOINT s;
ERROR: unimplemented: cannot use ROLLBACK TO SAVEPOINT in a HIGH PRIORITY transaction containing DDL
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://github.com/cockroachdb/cockroach/issues/46414
@jseldess I changed the behavior in PR #46415 - the deadlock is now avoided and reported as "unsupported feature". The relevant release note is this:
Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular savepoint or "restart savepoints" defined with
cockroach_restart
) causes a "feature not supported" error after a DDL statement in a HIGH PRIORITY transaction, in order to avoid a transaction deadlock. See issue #46414 for details.
Your text in https://github.com/cockroachdb/cockroach/issues/46414#issuecomment-612729267 is thus correct and adequate.
Thanks, @knz.
@knz as I understand it, this mostly requires adding a new priority level for transactions for lease acquisition to use, so I'm moving it from SQL Schema to KV; let me know if you disagree.
mostly requires adding a new priority level for transactions for lease acquisition to use
IT's both adding the new priority and teaching SQL how to use it. It's cross-team if you will. The question is "who should drive this". The current behavior is only visible to SQL users from the fact that certain operations will now be impossible (rejected) when issued within SQL transactions. Who's going to see that pain firsthand? I suppose it's going to be the SQL PM, not the KV PM.
Maybe you can assign this to the PM nearest to you to see what they want to do with it.
For the sake of the recording, one thing we've floated around is for KV to recognize the notion of dependent transactions, and incorporate that knowledge in the pushing protocol: If txn A depends on B and now B is trying to wait on A, don't actually wait. This seemed like a useful concept in general because we have other examples in the system of dependent transactions, and we've hit similar deadlocking issues with them too (e.g. resolving ranges, or any sort of "semi-sync" work).
summary: ROLLBACK TO SAVEPOINT can cause a deadlock in a high-priority txn that contains DDL, so CockroachDB is set to disallow it altogether for the time being.
Ideally, a better solution would enable lease acquisition to operate at an ever higher priority.
Details
Prior to #46170, we had a deadlock when rolling back a txn for a restart / for a savepoint if the txn contained DDL, because subsequent lease acquisitions would wait on the main SQL txn which encloses it.
(This is because lease acquisition uses a separate kv txn.)
With #46170 committed, the lease acquisition occurs at a higher priority, which guarnatees deadlock avoidance in the common case where DDL is ran at normal priority.
However the deadlock still exists when the DDL is executed at high priority too.
Solution
We need two "high priority" levels, where lease acquisition is at a level higher than anything a SQL txn can be set to.
Jira issue: CRDB-5089