Unfortunately, that assumption is not quite copacetic with what the layers above it do. There's currently two problems:
If the txn ever encountered an error, the Txn moves to the txnError error, at which point it forgets if it ever performed any writes (i.e. if it came from state txnReadonly or txnWriting, etc..), and so it will never elide future EndTransactions (rollbacks).
If the BeginTxn batch is rejected below the Txn layer, in the TxnCoordSender, then the client.Txn considers the transaction to be writing (cause it sent a BeginTxn), but the intent collector considers it read-only (cause it never saw said BeginTxn). This happens for example if the Stopper is stopped when the BeginTransaction is sent, and the TCS fails to start the heartbeat loop and rejects the batch.
The first problem goes away in #28185 because that PR brakes apart the different txn states and correctly tracks if a BeginTxn was ever sent.
The second problem is more fundamental, caused by the separate tracking of the BeginTxn done in both the TCS and the intent collector.
It's not very clear to me what to do about it. We could try to either share the "did we send a BeginTxn" state. This is a bit complicated by the fact that, in #28185, the BeginTxn tracking is not done by the TCS directly, but by the heartbeat interceptor. So we'd need to create a communication channel between two interceptors. I think what we'd want is for the txnLockGatekeeper to keep track of whether a BeginTxn is really sent to the server, and for the heartbeat interceptor to use that to dictate whether EndTransaction can be elided. Then any interceptor anywhere in the stack can retain the right to reject batches (as they tend to do already) and the intent collector can be left alone with the current assumptions - that if it sees and EndTxn it must have accumulated some intents.
Alternatively, the intent collector could get its own logic for eliding the end transaction, duplicating the existing one.
Yet alternatively, the intent collector could stop assuming anything and conservatively forward unnecessary EndTxns.
@nvanbenschoten for thoughts, if any.
Thanks @jordanlewis for seeing something and saying something.
The intent collector interceptor complains if it receives an
EndTransaction
without it having collected any intents prior. It returns an error saying that it's a "readonly txn" and theEndTxn
should have been elided above it. That error is supposed to signal bugs. https://github.com/cockroachdb/cockroach/blob/8044dea9c5c210c3ddb238a7edd446a93013e2e2/pkg/kv/txn_interceptor_intent_collector.go#L102Unfortunately, that assumption is not quite copacetic with what the layers above it do. There's currently two problems:
Txn
moves to thetxnError
error, at which point it forgets if it ever performed any writes (i.e. if it came from statetxnReadonly
ortxnWriting
, etc..), and so it will never elide futureEndTransactions
(rollbacks).BeginTxn
batch is rejected below theTxn
layer, in theTxnCoordSender
, then theclient.Txn
considers the transaction to be writing (cause it sent aBeginTxn
), but the intent collector considers it read-only (cause it never saw saidBeginTxn
). This happens for example if theStopper
is stopped when theBeginTransaction
is sent, and theTCS
fails to start the heartbeat loop and rejects the batch.The first problem goes away in #28185 because that PR brakes apart the different txn states and correctly tracks if a
BeginTxn
was ever sent.The second problem is more fundamental, caused by the separate tracking of the
BeginTxn
done in both the TCS and the intent collector. It's not very clear to me what to do about it. We could try to either share the "did we send a BeginTxn" state. This is a bit complicated by the fact that, in #28185, the BeginTxn tracking is not done by the TCS directly, but by the heartbeat interceptor. So we'd need to create a communication channel between two interceptors. I think what we'd want is for thetxnLockGatekeeper
to keep track of whether aBeginTxn
is really sent to the server, and for the heartbeat interceptor to use that to dictate whetherEndTransaction
can be elided. Then any interceptor anywhere in the stack can retain the right to reject batches (as they tend to do already) and the intent collector can be left alone with the current assumptions - that if it sees and EndTxn it must have accumulated some intents. Alternatively, the intent collector could get its own logic for eliding the end transaction, duplicating the existing one. Yet alternatively, the intent collector could stop assuming anything and conservatively forward unnecessary EndTxns.@nvanbenschoten for thoughts, if any. Thanks @jordanlewis for seeing something and saying something.
The error can be demonstrated with