cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.16k stars 3.82k forks source link

rac2: pull mode can cause overflow of raftReceiveQueue #135851

Open sumeerbhola opened 1 day ago

sumeerbhola commented 1 day ago

The RaftMaxInflightMsgs configuration param is also used on the receiver of Raft messages: Store.HandleRaftUncoalescedRequest will drop a message if the raftReceiveQueue for a range exceeds RaftMaxInflightMsgs+replicaQueueExtraSize (128 + 10).

In LazyReplication mode, the sender does not respect RaftMaxInflightMsgs. So it could send more entries and these will simply overflow the receiver queue, which is wasteful.

We should respect the RaftMaxInflightMsgs inside replicaSendStream when in lazy-replication/pull mode.

Jira issue: CRDB-44748