TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.51k stars 182 forks source link

Pass data from batcher to builder by chunk #491

Closed antiguru closed 1 month ago

antiguru commented 1 month ago

Currently, the data shared between the batcher and the builder are individual tuples, either moved or by reference. This limits flexibility around what kind of data can be provided to a builder, i.e., it has to be in the form of tuples, either owned or a reference to a fully-formed one. This works fine for vector-like structures, but will not work for containers that like to arrange their data differently.

This change alters the contract between the batcher and the builder to provide chunks instead of individual items (it does not require chains.) The data in the chunks must be sorted, and subsequent calls must maintain order, too. The input containers need to implement BuilderInput, a type that describes how a container's items can be broken into key, value, time, and diff, where key and value can be references or owned data, as long as they can be pushed into the underlying key and value containers.

The change has some quirks around comparing keys to keys already in the builder. The types can differ, and the best solution I could come up with was to add two explicit comparison functions to BuilderInput to compare keys and values. While it is not elegant, it allows us to move forward with this change, without adding nightmare-inducing trait bounds all-over.

frankmcsherry commented 1 month ago

This is looking good, modulo the conflicts to resolve!