Open nvanbenschoten opened 1 year ago
cc @cockroachdb/replication
Given state machine application very rarely syncs, we could have the state machine application happen in the same goroutine that is appending to the range's raft log. The benefit of unifying this scheduling is that if quorum is achieved with a lag of L log entries, state machine application will also only lag by L log entries relative the raft log append. @irfansharif and I have been discussing that this would simplify admission control tracking (since we want to minimize forecasting of the consequences of state machine application).
@irfansharif and I have been discussing that this would simplify admission control tracking (since we want to minimize forecasting of the consequences of state machine application).
How does this relate to the pacing of state machine application during recovery, when entries have already been committed by a quorum of replicas but a straggler replica is catching up using the log? Is the plan to pace log replication and state machine application together, or independently?
We’d want to start off with just pacing log replication. To replicate already committed entries to the straggling replica, the leader+leaseholder replica would wait for flow tokens. Flow tokens would only be returned by the node being caught up once replicated entries have been admitted on the remote node. Admission of work on that node would take into account the apply time effect (using the same linear modelling we use today to go from the size of the log entries AC is informed of, to get to the L0 growth it actually observes).
BTW, the flow control discussions were mostly motivated by the recovery case where it’s likelier to build up a large amount of unapplied state.
Extracted from #17500.
After #94165, raft log entry disk writes are asynchronous with respect to the raft state machine loop. However, the (non-durable) engine access for state machine application is still performed inline. The async storage writes interface permits us to extract all of this work onto a separate goroutine.
This would provide three benefits:
Jira issue: CRDB-23189