dchackett / taxi

Lightweight portable workflow management system for MCMC applications
MIT License
3 stars 1 forks source link

Adaptive req_time for safety #12

Open dchackett opened 7 years ago

dchackett commented 7 years ago

Currently, taxis are fed some "req_time" for each task and trust it completely. If the user drastically underestimates how long it takes to run a task, this can result in a taxi running over its time limit. This leaves a job abandoned and a missing-in-action taxi, both of which are annoying to fix.

Taxis could measure how long a given task takes, and if that time is greater than req_time, adjust req_time for that task upwards.

Issue: we need some natural way to group tasks together in to "req_time classes," where all tasks are expected to have the same req_time.

Issue: req_time may be expected to decrease over a stream, and increasing it for the entire stream could lead to inefficiency. Possible solution: taxis don't alter req_time in the dispatch, but keep their own local memory of req_time for each class. After respawning, it will forget any upwards adjustment. Possibly better possible solution: Put early and late jobs in the same stream in different req_time classes; adjust req_time classes in DB.