TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.53k stars 182 forks source link

DNM: Revisiting ownership of key and value types #429

Closed frankmcsherry closed 9 months ago

frankmcsherry commented 9 months ago

This PR works to decouple any relationship between the input data types D in collections, and the presented key and value types K and V surfaced by arrangements. There are otherwise a tight coupling of the two, often that D = (K, V), and certainly both K and V must be sized and ownable and all that. However, we would like to break that, to allow e.g. data of the form (Vec<u8>, String) and key and values respectively &[u8] and &str, which are less directly related.

We break these links whenever possible, ensuring that the Arrange trait expresses opinions only about the data types in input collections, and not at all with respect to the output types of formed traces (except that the timestamps must be comparable). The Join and Reduce trait implementations are now backed by methods join_trace and reduce_trace that are each driven by trace types T1 and T2, whose requirements are not to input types, but just that some may need to implement ToOwned in order to be captured and maintained for some duration. Elsewhere, we are forced to decorate previously naive types, traits, and methods with caveats like K: Sized or K: ToOwned<Owned = K> or similar implicit assumptions that are now made explicit.

frankmcsherry commented 9 months ago

The most recent commit adds support for Val being unsized, and only implementing ToOwned.

But also exciting, @antiguru, are the PreferredContainer trait and the Preferred<K,V,T,R> struct which implements Update and Layout and uses the containers preferred by the types (e.g. Vec<String> for String, and SliceContainer<u8> for [u8]). You can see an example of it in examples/spines.rs

frankmcsherry commented 9 months ago

The most recent commit relaxes Arrange to speak of input types, rather than potentially unsized arrangement types. This .. makes sense because the Arrange trait itself has no opinions about the trace types to be used, which show up only in the fn arrange<Tr> methods themselves. At that point, constraints on the traces are introduced, but only in the form of what inputs are required. All constraints on output types, e.g. [u8] or potentially GATs, are absent from this trait, its methods, and its implementations (except that the timestamps need to line up).

frankmcsherry commented 9 months ago

The most recent commit looks large, but it's almost all whitespace due to changing a tab indent level. They introduce functions join_trace and reduce_trace which are not part of a trait, and are generic in trace rather than key, val, etc. This is to present a minimally restrictive form of the functionality, partly to ensure we are able to write this only as a function of traces, rather than .. weird related types that might need owned and borrowed variants, etc. Nope! Or, mostly nope!