framed-data / overseer

Overseer is a library for building and running data pipelines in Clojure.
Eclipse Public License 1.0
97 stars 10 forks source link

Experimental: pluggable job/graph store #82

Closed elliot42 closed 7 years ago

elliot42 commented 7 years ago

Experimental.

This PR switches overseer to interacting with jobs and graphs through a pluggable Store protocol. This allows us to treat Datomic as just one (particularly awesome) way to store job graph data, but not the only one. Particularly, the Store protocol should be safely implementable with any transactional store, e.g. SQL (it still currently expects transactions on multiple attributes of one job, and/or multiple jobs at once to be atomic).

This by necessity requires leaking less Datomic-specific implementation details to the caller, so:

  1. Jobs are treated as plain maps, not Datomic-specific maps. So the specific Store implementation takes plain maps and then translates them into its own specific use case (with Datomic this is trivial, with other stores, it'll require more translation work.)

  2. Graphs are no longer Datomic-specific map format, but use the general-purpose Loom format to represent graphs between job maps. It turns out the graph adjacency list format we were using before is basically literally exactly the same as the format that Loom uses, so we can bring it Loom and get a bunch of functionality for free rather than reinventing our own graph stuff.

General cleanup and refactoring also included. Again, most of the work is making sure that only the Datomic-specific store knows about Datomic, and everything else goes through the abstract protocol. But there were also some interesting factoids like we were way under-utilizing Datomic's built-in capabilities to deal with graphs anyway.

This paves the way for a cleaner codebase, and a direct-on-MySQL implementation.

andrewberls commented 7 years ago

I should probably stop myself before I dive too deep into nits just yet. In general this looks extremely promising and moves in a direction I very much agree with. It's a little hard for me to follow the new graph queries at a glance; I'm going to study up on Loom and (re-) study the intricacies of Datomic before revisiting!

andrewberls commented 7 years ago

Should be safe to close this one out given https://github.com/framed-data/overseer/pull/88