framed-data / overseer

Overseer is a library for building and running data pipelines in Clojure.
Eclipse Public License 1.0
97 stars 10 forks source link

Find handleable jobs concurrently #50

Closed elliot42 closed 9 years ago

elliot42 commented 9 years ago

Prior to this commit, the Overseer worker would, before running every job, synchronously query the database and compute which jobs were ready to run. According to the logs, this can take three seconds every time to compute this result set. This sets a maximum number of jobs per minute we can run per worker, even if the jobs themselves take 0 seconds: 60 sec / 3 sec DB lookup time = max 20 jobs per minute.

Because we now have large numbers of subsecond jobs, it's worth it to try to have Overseer latency be as minimal as possible, so the worker throughput is primarily bottlenecked by min job time.

This commit makes it so that the worker doesn't pay the DB lookup time on every job handling loop. The job finder and the job executor run concurrently and asynchronously, with the job finder continually updating a list of available jobs into a shared atom, and the job executor attempting to burn down that list as fast as it can, until a few seconds later when the list is updated by the finder.

The worker executor still needs to reserve jobs, so ultimately the transactor still serves as the central arbiter between worker processes, so they're generally dissuaded from stepping on each others' toes.

This should move the bottleneck from "how fast can I query the entire job DB"? (at best 1 every 3 sec.) to "how fast can I reserve and handle jobs?" (at best multiple per 1 sec.). This should help us be able to burn through many small tasks (such as intake no-ops) efficiently.

elliot42 commented 9 years ago

Happy to talk about this one, am going to try it out and see if it can alleviate some of our job clogging going on.

andrewberls commented 9 years ago

This looks good in general, I've been idly meaning to go look at the actual queries since I think they're like, super unoptimized. I wonder if an index or clause reordering or something would make things a lot easier in a simpler way as well

elliot42 commented 9 years ago

(This appears to be working properly in production. I'm curious what happens all the way down when the job pool is empty, but it should be fine logically, and I tested it at the REPL, so hopefully we'll find out soon if/when we can get the handling rate faster than the injection rate.)