kv: ignore pushed intent without Raft consensus

To address this, we will need to adjust the lockTableWaiter to not immediately ResolveIntent(PENDING) intents that non-locking reads encounter and are able to push to a higher timestamp using a PushTxn(PUSH_TIMESTAMP).

The original thinking here was that we would instead retain some information on the concurrency.Guard about pushed transactions and plumb this information down into pebbleMVCCScanner. The pebbleMVCCScanner, upon seeing an intent whose transaction is known to have been pushed to a higher timestamp, would ignore the intent and present the key's next version to the reader. This does not seem terribly difficult but does involve some plumbing of state around.

However, we arrived at a more elegant design for this which generalizes to other forms of intent resolution and enables fused "resolve-and-replace" consensus proposals. The idea is that we first begin deferring all intent resolution in the lockTableWaiter, similar to how we handle ResolveBeforeScanning today. A request's deferred resolution set is taken into account when determining which locks it conflicts with. Eventually, it conflicts with no locks that don't have corresponding deferred "resolution instructions".

We then give requests the choice of whether they want to realize the deferred resolution requests immediately, before latching and evaluation, or whether they want to virtualize/fuse them during evaluation. To realize them immediately, the request simply issue the ResolveIntent requests and push them through Raft, like they do today. This is a useful fallback option.

However, requests can also handle the deferred resolution during evaluation. Read-only requests have the option to virtualize the resolution and read-write requests have the option to fuse with the resolution. Doing so starts with the storage.Engine constructed during evaluation. Read-write requests continue to create a storage.Batch. For the first time, read-write requests also create a storage.Batch. Then, regardless of request path (read-only vs. read-write), command evaluation is run using the Batch and the deferred ResolveIntent requests. The result is a write batch with all conflicting intents resolved such that they no longer conflict with the rest of the BatchRequest. The BatchRequest can then evaluate its original requests on top of the Batch, knowing that it is observing the post-resolution state.

The final trick here is that read-write requests can proceed to propose the entire write batch to raft. This allows them to propose a raft entry that contains intent resolution and the subsequent intent replacement together.

The benefits of this approach are:

request evaluation remains ignorant of the virtualized intent resolution. We don't need to teach pebbleMVCCScanner how to ignore certain intents, it just won't see them in its view of the storage engine.
read-only requests can ignore pushed intents without Raft consensus.
read-write requests can replace one intent with another in a single round of Raft consensus (which will often be pipelined and async).
in both cases, a round of synchronous (i.e. blocking) Raft consensus is avoided.

I created a prototype of this in nvanbenschoten/virtualResolve, but it still needs a lot of work.

cockroachdb / cockroach

kv: ignore pushed intent without Raft consensus #94730