The force-flush mechanism bypasses token waiting and optimistically/eagerly replicates the log to a peer. There is no pacing/limiting. If the peer doesn't send any MsgAppResps for a bit, we can accumulate a large in-flight window. If many ranges do that simultaneously, this can lead to OOM.
Previous raft behaviour is eager too, but it has max-inflight limits per leader->peer flow and works mostly well, rarely having issues.
We should consider adopting this previous raft behaviour in RACv2 for force-flushing.
The force-flush mechanism bypasses token waiting and optimistically/eagerly replicates the log to a peer. There is no pacing/limiting. If the peer doesn't send any
MsgAppResp
s for a bit, we can accumulate a large in-flight window. If many ranges do that simultaneously, this can lead to OOM.Previous raft behaviour is eager too, but it has max-inflight limits per leader->peer flow and works mostly well, rarely having issues.
We should consider adopting this previous raft behaviour in RACv2 for force-flushing.
Jira issue: CRDB-44733
Epic CRDB-42900