Open frankmcsherry opened 4 months ago
In the common case where the arrangement is not a retain history index, will this consolidation still do something?
Also, will this increase the memory spikes of initial hydrations where the arrangement is not a retain history index?
In the common case where the arrangement is not a retain history index, will this consolidation still do something?
Probably not. For an alternate approach, https://github.com/TimelyDataflow/differential-dataflow/pull/490 attaches the logic to TraceFrontier
, so that folks who are not importing traces will not experience any overhead.
Also, will this increase the memory spikes of initial hydrations where the arrangement is not a retain history index?
I hope not. The buffer size is going to be proportional to the longest history of any one (key, val)
pair, and .. we could dial that down with a ChangeBatch
style structure, which consolidates as it goes (if the history is one that consolidates down, it would be even less consequential). Though, getting ChangeBatch
out of timely and into differential would be a bunch of typing.
The
Arranged::as_collection
method converts an arrangement to a stream of its underlying updates. It does this without any thought for consolidation as it does, under the historically accurate but no longer correct take that the full arrangement detail is required. When one uses aTraceFrontier
wrapper, which advances contained timestamps, many of these updates may now consolidate. We should do the work here, at the first moment, rather than downstream. The incremental cost means additional work for folks who don't need it, but the harm done to folks usingTraceFrontier
can be much larger.