TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.56k stars 183 forks source link

Question: what can differential dataflow be used for? #154

Open zxti opened 5 years ago

zxti commented 5 years ago

This is a fascinating project. I've been studying differential dataflow and specifically the core operators like iterate and arrange. I've also been trying to map out how it juxtaposes with reactive programming (a la MobX).

Can differential dataflow be used to implement, say:

Most of the examples I can find on differential dataflow show how to perform some generic data crunching with very simple input and output structures, but I'm asking more about transforming some rich data structures into other rich data structures.

frankmcsherry commented 5 years ago

Hi. Thanks for the nice words.

I think the tl;dr is that to use differential dataflow you may need to "rethink" your program a bit, so that it is comprised of many smaller and simpler actions. We have built some more complicated things on top of it, but you need to opt in to the data-parallel idioms for things to work out (vs taking a more vanilla ML or Python program and having it "just work").

  1. One of the examples is doop, an implementation of a minimal set of program analysis rules from Yannis Smaragdakis from their Doop work. Rust itself used differential dataflow for a while as part of their prototype new lifetime checker. Generally, I think there are some program analysis tasks that fit great, and you could imagine realtime updates in your IDE as fast as you type (the doop update latencies average 28ms). An entire compiler would be a lot of effort, and while intellectually plausible it would be irresponsible to say "yes, totally".

  2. For React, cc @comnik who is super interested in this area, I think. It sounds like a good fit (React is a lot like "view maintenance" over the DOM, and differential does a fine job doing view maintenance over tree-like data-structures.

zxti commented 5 years ago

That's awesome, examples like doop and Rust seem like perfect subjects to dig into (BTW why didn't Rust stick with it?). I'm exactly interested in just how deeply "rethought" DD programs must be, so this is great!

For (2), do you have similar pointers to examples of tree view maintenance?

frankmcsherry commented 5 years ago

Rust switched to a custom Datalog engine I wrote for them (https://github.com/rust-lang/datafrog) and didn't really need the incremental aspect (they started using it because it was faster than the other solutions they had around). The compile times were much higher than the custom version.

I'm hoping Niko (@comnik) will show up with some thoughts about React stuff. Generally a bunch of the graph processing examples do similar things, where the input are a graph (potentially directed, tree-like) and you write queries that traverse the graph and produce output state at each graph node. We usually use this for "bfs" or "shortest paths" but it seems likely that "transform DOM into web stuffs" (sorry) also fits.

comnik commented 5 years ago

Re React / VDOM reconciliation: Differential shines at constructing and incrementally maintaining views on application state, such as derived from declarative component queries. We use Datalog or GraphQL for describing what state a component is interested in, but as long as it translates into joins, filters, and aggregations, Differential is more than happy to handle it.

Once you have your UI's data needs transformed into a hierarchical dataflow, the outputs should be directly useable for reconciliation (without diffing the full VDOM snapshots). When experimenting with this I wanted to leave the reconciler as it is and instead used state hooks to control reconciliation. This gives you the granularity (w.r.t reconciliation) of component-local state, even though your state might be derived by a dataflow running on a cluster somewhere.

It's still early days, but I've started putting together some demos over at comnik/functional-differential-programming. I'm somewhat embarrassed to point people at them at this stage and they require a WebAssembly build of Differential, but I will be updating them as I go.

The plan is also to build a frontend for https://github.com/comnik/timely-observations that makes use of these ideas and can serve as a non-toy example.

There is a slight impedance mismatch between the snapshot-oriented way of writing components, and the diff-oriented way of describing state updates. For example, for large lists, if you attach every item to its own differential dataflow, then that will make for blazing fast reconciliation, but probably cause all kinds of other problems (e.g. constant teardown and re-creation of dataflows as a user scrolls the list). Therefore I think that a hybrid approach might be best, where the heavy lifting is handled by Differential, and smaller batches are done via reconciliation.

Anyways, that is my current knowledge on the matter!