ewels / clusterflow

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
https://ewels.github.io/clusterflow/
GNU General Public License v3.0
97 stars 27 forks source link

Build UPPMAX jobstats module #57

Open ewels opened 9 years ago

ewels commented 9 years ago

It would be nice to build a module to look at the performance of modules - if they're using the resources that they allocate, especially time.

Uppmax have some nice tools to look at this. I think that a module could scrape the submission log text file to get Job IDs and submission resources, then compare this to what was actually used.

For example:

finishedjobinfo -j 5046208

gives: (split onto multiple lines for readability)

2015-04-13
16:28:09
jobid=5046208
jobstate=COMPLETED
username=phil
account=b12345678
nodes=m163
procs=2
partition=core
qos=normal
jobname=cf_samtools_sort_index_1428934402_samtools_sort_index_452
maxmemory_in_GiB=5.4
maxmemory_node=m163
timelimit=04:00:00
submit_time=2015-04-13T16:13:48
start_time=2015-04-13T16:16:22
end_time=2015-04-13T16:28:09
runtime=00:11:47
margin=03:48:13
queuetime=00:02:34

Also can run jobstats:

jobstats -p -v -d 5046208

Gives:

jobid   cluster jobstate    user    project jobname endtime runtime flag_list   booked  core_list   node_list   jobstats_list
5046208 milou   COMPLETED   phil    b2013064    cf_samtools_sort_index_1428934402_samtools_sort_index_452   2015-04-13T16:28:09 00:11:47    .   2   2   m163    /sw/share/slurm/milou/uppmax_jobstats/m163/5046208

And this: milou-b2013064-phil-5046208

Ideally, the module could run some custom stats and then build a summary HTML report or something? Maybe e-mail this? Could be cool if it could log some stats centrally somewhere too.

Finally, this may need an extra feature in CF to have the option to always append or prepend modules to every pipeline. Probably a good thing to have anyway, and should be a relatively easy config addon?