framed-data / overseer

Overseer is a library for building and running data pipelines in Clojure.
Eclipse Public License 1.0
97 stars 10 forks source link

Experimental: Add JDBC Store implementation #88

Closed andrewberls closed 7 years ago

andrewberls commented 7 years ago

This adds an implementation of overseer.core/Store for JDBC-capable backends such as MySQL. Two system tables are used - overseer_jobs for tracking job id/type/status, and overseer_dependencies for linking job_id/dep_id graph edges. Internally, HoneySQL [0] is used in conjunction with clojure.java.jdbc for DB interactions.

A key aspect of the implementation is the optimistic concurrency control [1] implemented via a lock_version column, much like Rails [2]. This results in a read-then-update pattern when state-transitioning jobs, where we expect no conflict in most cases (with the exception of job reservation) and abort the transaction if the row has since changed. There is a single bespoke integration test for the concurrent update case, and it will be interesting to think about generalized model checking in the future.

The JDBC dependents function is a slight modification of the old transitive-dependents function, but now traverses the graph one level at a time, rather than individually querying for each job.

While all store protocol tests pass on the new implementation, it will be useful to do thorough workload testing before marking this implementation as production-ready.

0: https://github.com/jkk/honeysql 1: https://en.wikipedia.org/wiki/Optimistic_concurrency_control 2: http://api.rubyonrails.org/classes/ActiveRecord/Locking/Optimistic.html

andrewberls commented 7 years ago

@elliot42 This has been updated with the optimistic concurrency control we discussed, which was much easier than expected. The JDBC code is still a bit verbose, but that's not a huge issue to me, for now.

andrewberls commented 7 years ago

At this point, the major items outstanding I think are concurrent process testing, and possibly some kind of Component/stop-from-the-outside capability. I'm going to follow up with those items and move forward with this to start getting some production testing in.