TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.58k stars 184 forks source link

Present lower-level operator interfaces #326

Open frankmcsherry opened 3 years ago

frankmcsherry commented 3 years ago

Several operators are written with interfaces that are meant to be "foolproof", in that you should not be able to mis-use them. Specifically, the join class of methods allow you to provide closures that act on data, but are not provided time or diff, as it is possible to mis-use that responsibility. At the same time, there are useful operators that can benefit from that information: specifically, there are time and diff manipulating linear operators (e.g. temporal filters, negate, etc) that you might want to fuse on to the join output.

We could present "unsafe" versions of several operators that expose these details, for power users who are confident that they can write not-incorrect logic, or at least who are willing to absorb that responsibility. The upside for these users is the ability to avoid producing outputs in cases where filtering would apply (e.g. temporal filters) or actions that move information from data to diff (e.g. explode style operators).

Examples that seem like they are clear candidates are the join variants, as well as dogsdogsdogs's lookup_map and half_join operators. It wouldn't be unreasonable to do reduce as well, though I don't have an immediate use case for it that would be easy to validate ("also fusing operators").

One straw man example would be an appropriately named join_idk_internal_unsafe that exposes the most generality we can manage, without concern about shepherding the user to correct behavior, and then target the existing join_ variants onto it, mapping their safer closures to the framework of the unsafe operator.