TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.53k stars 182 forks source link

Implement `OrdValBatch` without `retain_from` #419

Closed frankmcsherry closed 9 months ago

frankmcsherry commented 9 months ago

This PR provides a re-implementation of OrdValBatch with a few properties:

  1. it is more direct than the existing implementation, based off of trie layers,
  2. it only writes advanced and consolidated updates, rather than rewriting them later,
  3. it is amenable to container implementations for Vec<(Time, Diff)>,
  4. it is amenable to compact Vec<(Time, Diff)> representations (RLE).

There might be other things to like about it. In the fullness of time it would mean we could remove the trace/layers module, because the direct implementations aren't much more complicated, and in some ways are much simpler because of their directness.

This PR passes tests, but it probably wants a fair bit of exercise to see if it exactly tracks the existing implementations.

cc: @antiguru

frankmcsherry commented 9 months ago

The most recent commit reorganizes the two spine implementations, and re-exports ord.rs's spines (the old ones) as ValSpine and KeySpine from their shared module root. All direct uses of e.g. OrdValSpine are replaced with ValSpine. The notable exception is columnation.rs which is meant to exercise columnation and non-columnation spines, so I left it pointing at the specific ones.

frankmcsherry commented 9 months ago

Some light performance investigation suggests that this is not universally worse than the ord.rs spine. Running the bfs example with n=10^6, m=2x10^6, batches of size 1000, 1000 rounds completes in

ord    42.236413s
neu    39.531365375s

Running with n=10^8 and m=2x10^8, it takes both of them roughly the same time to load the data

ord    38.825862291s
neu    37.227272667s

Anecdotally through Activity Monitor, it appeared that ord went up to about 7GB where neu stayed around 3GB. That was very unscientific on my part, though.