TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.54k stars 183 forks source link

Efficient merging with empty batches #407

Open teskje opened 1 year ago

teskje commented 1 year ago

This PR makes merging of batches with empty batches efficient. Previously this operation would be handled like a regular merge, by allocating a new batch and merging the input batches into it. However, if one of the batches is empty, there is no need to allocate a new batch, we can simply extend the bounds of the existing non-empty batch accordingly.

The implementation of this optimization is complicated by the fact that there are Batch implementations for Rc<Batch> and Abomonated<Batch>. Both of these only provide an immutable reference to the underlying batch, making it impossible to adjust the batch description to extend the batch's bounds. This commit solves this by introducing new RcBatch and AbomonatedBatch wrappers that, in addition to the Rc/Abomonated batch, hold their own Descriptions that override the inner batch's description.

Some adjustments to logging of merge events is also necessary, to account for the new merge path.

Fixes https://github.com/MaterializeInc/materialize/issues/21165.

teskje commented 1 year ago

Materialize CI is happy with this change.