cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.52k stars 3.7k forks source link

kvserver: clean up proposals pipeline #116020

Open pav-kv opened 7 months ago

pav-kv commented 7 months ago

Forking off https://github.com/cockroachdb/cockroach/issues/115020#issuecomment-1849918456.

I'm leaning towards a clearer design, like this:

  1. Have a FIFO queue of in-flight proposals. They are added sequentially, and have consecutive "lease indices", so no fancy data structures are needed (except maybe some MPSC support for concurrent writes, like in the proposal buffer).

    • In fact, this possibly can be integrated with the proposals buffer which is a similar queue. The only difference is that we would keep proposals in the queue until they are applied or got behind LAI, not just until they are flushed to raft.
    • Integrating with the proposal buffer though can be hardened by the fact that refreshProposalsLocked wants to push duplicates to it. So we may still need 2 separate queues.
  2. When LAI state changes, wipe a prefix of this queue up to the new LAI (after reporting all the successfully applied commands). Decide on the unapplied / unknown to be applied commands:

    • For proposals that can be reproposed with a newer LAI, do it; push back to the queue correspondingly.
    • For proposals that can't be reproposed (and won't be), report a permanent failure.
    • For proposals that were jumped over (like when there was a snapshot), report an ambiguous failure and don't repropose.
  3. When a replica is destroyed, or lease moves (figure out exactly what those conditions are), wipe any outstanding proposals in the queue and prevent inserts to it. Report ambiguous errors for them.

  4. Not sure if the proposals map is needed in the first place. The above data structure is searchable by LAI in O(1) since LAI is strictly +1 incremental. The map would only be needed if we additionally need to search by the proposal ID, but I don't think we need to. We will use the command ID only for sanity checks to make sure that search in the queue by LAI finds the correct command.

  5. We do need to track multi-proposal proposals though (those that update LAI and repropose). From the above, we have the invariant that an in-flight proposal is always in the queue and always exactly once, so we can store all this information right in the queue. Or we could factor out some bits that are shared across multiple reproposals.

  6. Do not touch the proposals state (and proposals queue) from refreshProposalsLocked at all. All it needs to do is scan the queue, and insert slow proposals to the proposal buffer. This nicely separates concerns. We may replace the queue scanning with a better heap-based approach, so that we don't have to scan the entire queue all the time (as we do now).

Jira issue: CRDB-34385

Epic CRDB-39898

blathers-crl[bot] commented 7 months ago

cc @cockroachdb/replication

pav-kv commented 7 months ago

TLDR: