Resource allocation for tasks? (Long-term)

etneil commented 7 years ago

We already keep track of run time for tasks; it would be trivial to extend this to resources used (i.e. node-hours.) It would be interesting to have the option of adjusting task priorities through an internal accounting system, based on total usage for a particular stream vs. some target amount - percentage of a whole, or a fixed number.

The use case I'm envisioning is something where multiple competing tasks are being run on the same machine against the same allocation; this is how leadership-class computing is currently doled out in USQCD, more or less. It would definitely be useful to have a common central system that could control who gets what fraction of the total time; this is currently done by the honor system and it's possible to make mistakes and overrun one's fraction unintentionally.

(We may have competition in this space, and it would also require solving the problem of having multiple users interacting with a single dispatch, so that's more of a long-term possible goal. But keeping track of resources used would be useful on its own.)

etneil commented 7 years ago

Here is some of our competition (written in Bash, being followed up in C++):

https://arxiv.org/abs/1702.06122

dchackett commented 7 years ago

taxi should generalize relatively straightforwardly to task bundling. We just need each "queue job" to run multiple taxis. The simplest possible case of this would be having taxi.sh launch N taxis, each of which knows about (total_cpus)/N cpus. We probably want to be smarter about this, however.

(How?) can we run multiple taxis so that each uses cores from the same node, versus scattered across all of the available nodes?

A smarter thing to do than launch N taxis would be to have some sort of smart "meta taxi" written in python. It runs taxis via something like the 'multiprocessing' package, instead of launching them from the shell as completely separate processes. A meta-taxi could then easily maintain the correct number of running taxis, as well as mediate interaction with the dispatch DB (to reduce synchronization issues).

Alternately, keeping with the "all centralized control is passive" paradigm, we could have the 'meta-taxi' act instead as a sort of 'local dispatcher'. Taxis can launch new taxis on their own node, or launch new (meta-)taxis to be run as separate queue jobs. I think I like this better than an active meta-taxi.

I bet this could be easily adapted to work with inhomogenous grids.

It might be nice to have this be something that can be "plugged in" versus having to use the trivial case if you don't want to do bundling.

dchackett / taxi

Resource allocation for tasks? (Long-term) #19