ewels / clusterflow

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
https://ewels.github.io/clusterflow/
GNU General Public License v3.0
97 stars 27 forks source link

Specify expected time for modules #45

Closed ewels closed 9 years ago

ewels commented 9 years ago

Most HPC systems have an option to specify expected job execution time, which affects queuing. At the moment, Cluster Flow basically ignores this (SLURM jobs default to two days for example), which hurts our queue performance.

It would be nice if modules are able to specify how long they thing they'll take to run. For SLURM jobs, if this is less than 15 minutes we can get the bonus of specifying --qos=short which will make them much faster.

ewels commented 9 years ago

@s-andrews: I assume GRIDEngine has some kind of similar support? Would you be interested in adding this for SGE?

We have some nice resource usage tools which I'm might try to use to optimise module requests. I'll try to write some sensible time estimation code for modules that I use as well, rather than just flat estimates. Obviously setting these too low will be very annoying if jobs get cancelled, the increased queue efficiency should be worth it though.

s-andrews commented 9 years ago

GridEngine has qacct which records some stats for each job run (CPU, runtime and memory mostly). We've been tweaking the existing limits to match what we see from real jobs as a lot of the older limits proved to be a little under where they needed to be.

I'm sure GridEngine can be set up to take runtime into account when submitting jobs (normally I think people just set up short and long queues with different limits on them), but it's not something we use.

ewels commented 9 years ago

Ok, sounds good - if there are any new limit tweaks you've done recently, it'd be good to see them to give me a head start on fine tuning all of this. I'll implement the time estimation thing as a parameter for the qsub submission string, so it'll be totally optional and there if you want it in the future.

s-andrews commented 9 years ago

OK, will check if we have any unsubmitted tweaks.

I was having problem getting our branch back in line with your mainline version. I couldn't generate a pull which seemed to bring them back together. If you had a second to see if you could figure it out that would help us out (I think you have commit privileges on our branch).

Simon.

ewels commented 9 years ago

The PR looks like there's only one change you've done from the 0.4 devel branch, adding requirements to cf_download. There's a merge conflict though, so I've probably done something similar at my end.

Not sure it's worth faffing with the merge for now if that's the only change.

s-andrews commented 9 years ago

We'd made some changes to pipelines, but maybe those had already been pulled in. I've just done a git status on our working copy and it says there's nothing else to commit.

If the merge is too much of a pain we'll just delete and re-fork when you get the next release out.

Simon.

ewels commented 9 years ago

Sounds good..

ewels commented 9 years ago

Done as part of #47

ewels commented 9 years ago

See #56 for the code changes.