cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.18k stars 3.82k forks source link

rac2: protect from unlimited force-flush #135814

Open pav-kv opened 5 days ago

pav-kv commented 5 days ago

The force-flush mechanism bypasses token waiting and optimistically/eagerly replicates the log to a peer. There is no pacing/limiting. If the peer doesn't send any MsgAppResps for a bit, we can accumulate a large in-flight window. If many ranges do that simultaneously, this can lead to OOM.

Previous raft behaviour is eager too, but it has max-inflight limits per leader->peer flow and works mostly well, rarely having issues.

We should consider adopting this previous raft behaviour in RACv2 for force-flushing.

Jira issue: CRDB-44733

Epic CRDB-42900