cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.61k stars 3.71k forks source link

streaming: update the stream ingestion processor to use a flushing buffer #59176

Open pbardea opened 3 years ago

pbardea commented 3 years ago

Currently the stream ingestion processor uses a slice as it's buffer before it flushes. We should have some sort of BufferingAdder that supports inserting keys at a particular timestamp. Concrete actions here:

Jira issue: CRDB-3325

Epic CRDB-19048

blathers-crl[bot] commented 2 years ago

cc @cockroachdb/cdc

stevendanna commented 2 years ago

I think we still want to do this. At the very least, we need memory monitoring around the buffer in the ingestion processor.

gh-casper commented 1 year ago

Put some time on reusing the buffer_adder in stream_ingestion_processor, it can end up very messy code. After some research on this, I think we better have its own buffer_adder, sst batcher and stats for stream ingestion:

  1. better observability on flushes, latency, memory monitoring.
  2. consolidate current range key batcher with the mvcc key batcher into one piece.
  3. more control on how we flush and split the sst we are about to ingest.

However, we can still reuse the kvBuf for the way it encode keys and values, it needs some modifications mostly how we sort it kvBufEntry if we want to encode the key and timestamp into one fat key.