framed-data / overseer

Overseer is a library for building and running data pipelines in Clojure.
Eclipse Public License 1.0
97 stars 10 forks source link

Instrument framework-level errors #98

Closed andrewberls closed 7 years ago

andrewberls commented 7 years ago

We've observed cases of workers appearing to stall out and not pick up new work, even though eligible jobs are present (with "eligibility" determined using the same functions the workers do). The current running hypothesis is that one or more internal Overseer components are experiencing errors, causing the system to come to a halt, with no visibility into those errors.

This instruments the executor and ready-job-detector with a new exception handler that will log any errors locally, then to Sentry, and then fatally shut down the entire process (as errors in these stages are irrecoverable framework errors). Note that the heartbeat process already has its own error detection and (configurable)shutdown logic; no attempt is made here to unify these things.

andrewberls commented 7 years ago

Interesting considerations RE making workers consistent with the instrumented future programming model. Will run the experiment for some time and revisit if we want to pursue this route or revert the change.