framed-data / overseer

Overseer is a library for building and running data pipelines in Clojure.
Eclipse Public License 1.0
97 stars 10 forks source link

0.7.0: Add job completion supervisor #53

Closed andrewberls closed 8 years ago

andrewberls commented 8 years ago

Because of our chosen redundancy/resiliency strategy of treating started jobs as eligible, It's possible for very long-running jobs to effectively paralyze the cluster. Even if one node successfully completes the long task, the other nodes will wastefully continue working.

This adds a new in-process 'supervisor' which is responsible for continually checking the status of the current job, and terminating the entire process if it has been completed by another node. A first draft attempted to kill the executor in-process (using future-cancel) but this caused issues with exception handlers spewing noise. Killing the whole JVM process relies on the presence of an external supervisor such as Upstart to restart the process, and thus the supervisor can be enabled or disabled in config, defaulting to being disabled.

The worker code is now restructued to basically be a parent for several concurrently-running processes (futures) - the ready job detector, the job completion supervisor, and the actual process running a job (the "executor").

This also extracts all config-related code into overseer.config.

andrewberls commented 8 years ago

Realized that heartbeats are actually a total superset of this functionality, closing this out