TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.53k stars 182 forks source link

Revisit the stashing logic in MergeBatcherColumnation #426

Open antiguru opened 9 months ago

antiguru commented 9 months ago

As surfaced recently, the stashing logic to recycle empty buffers in the merge batcher is subtle; it should make sure there are at most 2 empty buffers, which is all it will ever need. The vector-based merge batcher implements this, but the logic is not simple to understand. When implementing the columnation-based merge batcher, I got it wrong, causing the seal function to temporarily retain all empty buffers, which caused an OOM situation.

We should revisit that the implementation maints the invariant that there are at most two empty buffers, and reason about why this is sufficient.