dchackett / taxi

Lightweight portable workflow management system for MCMC applications
MIT License
3 stars 1 forks source link

Conditional dependencies and anti-dependencies #18

Open dchackett opened 7 years ago

dchackett commented 7 years ago

When running a bunch of ensembles, branching out through parameter space, an issue occurs frequently: some ensemble fails to run, and is just not worth running (e.g., right on top of a transition at very small quark mass, etc). However, the streams at other points in parameter space are supposed to fork off of this failed stream. They now cannot run without tedious editing of task dependencies and command line arguments (to change loaded gauge file at the start of the stream). Frequently, the solution is simply to make a second dispatch DB after the fact to run these unreachable streams.

A solution to this (and many other) problem is to add in conditional dependencies. For instance, "to run job 4, any of job 1 or job 2 or job 3 must be complete".

In the forking-streams example, imagine job 7 is the second task in a new stream. In order for job 7 to run, any of jobs 4, 5, or 6 must have run. Jobs 4, 5, and 6 are different versions of the first task in the new ("forked") stream, preceding job 7. Each of 4/5/6 loads a gauge file produced by job 1 or 2 or 3, respectively, which are from other ("forked-off-of") streams.

Issue: In the example, imagine job 1 finishes, so job 4 starts, and then after job 4 is done the new ("forked") stream continues happily from job 7 onwards. If job 2 finishes later, this allows job 5 to run, which might clobber the gauge configuration originally written by job 4 (or at least leave a mess in the dispatch DB). So, in order to prevent gauge files from being overwritten and/or work being replicated, we also need "anti-dependencies". For instance: job 4 would anti-depend on jobs 5 and 6; if either of these is not pending, then job 4 is blocked. Similarly for jobs 5 and 6.

Anti-dependencies are easily implemented as a second list for each task in the dispatch DB.

I am not sure what the most graceful way to implement conditional dependencies is.

Idea: Nested lists alternate between "and" and "or", like: [4,5,6] -> 4 && 5 && 6 [[4,5,6]] -> 4 || 5 || 6 [[4,5],6] -> (4 || 5) && 6 [[[4,5],[5,6],[4,6]]] -> (4 && 5) || (5 && 6) || (4 && 6)

This could also be easily implemented for anti-dependencies, but they should probably work like 'or-and-or-and-...' instead of 'and-or-and-or-...'.

dchackett commented 6 years ago

For arbitrary logic, need "not". Restrict job IDs to positive integers, and then -1 means "Job 1 is not complete or running".

This allows for jobs to block each other. Useful for e.g. different versions of the same stream to try to run a stream simultaneously on different clusters.