framed-data / overseer

Overseer is a library for building and running data pipelines in Clojure.
Eclipse Public License 1.0
97 stars 10 forks source link

Improve perf of `jobs-ready` #31

Closed elliot42 closed 9 years ago

elliot42 commented 9 years ago

Prior to this commit status/jobs-ready was taking 20s to find the jobs that are ready to run. This commit makes the query take 3s to find the same data.

The general algorithm does the following:

  1. Find all jobs that are not finished/aborted/failed yet, i.e. could be run if their dependencies were all satisfied
  2. Subtract the jobs whose dependencies are not satisfied

However, prior to this commit, step #2 above was searching the entire database for jobs whose dependencies are not satisfied, it wasn't constraining the search space to only those jobs that were part of step #1 above. This commit makes step #2 only consider jobs from step #1 from the start, which greatly cuts down the search space.

This improves the inter-job delays we're seing between jobs. This cost is particularly high if the workers are waiting 20 sec. in between 1 sec jobs, which this commit should at least partially ameliorate.

andrewberls commented 9 years ago

Version bump?

:+1: woo!