ewels / clusterflow

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
https://ewels.github.io/clusterflow/
GNU General Public License v3.0
97 stars 27 forks source link

Wall time #120

Open fjames003 opened 5 years ago

fjames003 commented 5 years ago

I currently have the @max_time set to 23:59:00 in my clusterflow.config file and I have changed cr_download.cfmod.pl to make the time equal to num_files * 60 because my cluster is not on a dial up connection and I also changed sra_fqdump.cfmod.pl to be num_files * 60 but everytime I run the test command provided cf --genome GRCh37 sra_bowtie2 ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/sralite/ByExp/litesra/SRX/SRX031/SRX031398/SRR1068378/SRR1068378.sra the first job it creates is asking for 2 full days of time which forces the job onto a partition called long which has much less resources than the main partition. I can't seem to figure out why after changing the cf_download module the time remained the same as if I had not changed the script. Also 2 days is longer than the max time I set in my config and I don't know why that is allowed.

ewels commented 5 years ago

Hi @fjames003,

That was a bit of a mouthful! 😀 Let me see if I can break this down a little to confirm that I've understood you:

1) You set @max_time to 23:50:00 in your config. This is the default: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/clusterflow.config.example#L26

2) You modified these lines to make the requested time shorter: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/modules/cf_download.cfmod.pl#L39-L40

3) You also modified these lines to make the requested time shorter: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/modules/sra_fqdump.cfmod.pl#L38-L39

4) Despite this, when running the sra_bowtie2 pipeline, Cluster Flow submits a job with a time limit of two days.

So, two issues here - firstly, @max_time isn't limiting the job time as expected, secondly the job requests are longer than expected.

I'd suggest debugging the first problem as follows: 1) Is your config file definitely being found and parsed? You can try putting a print statement in here to ensure that the line is being picked up: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/source/CF/Constants.pm#L148

2) @max_time should set the internal variable $JOB_TIMELIMIT which is then used here to cap all job submission requests: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/cf#L1025-L1030 You can try putting print statements in here too, to see what the evaluated variables are turning out as to try to pin down why this is being let through.

The second problem is pretty strange too. Again I would start by putting in print statements to ensure that the code you changed is definitely being executed. You can also just put static strings in the section that you edited, which is less vulnerable to strange stuff.

I hope this makes sense, let me know how you get on. Sorry that I don't have any simple fixes for you - it's nigh on impossible to debug these kinds of problems remotely, and "it works fine for me" â„¢

Cheers,

Phil

fjames003 commented 5 years ago

Hi Phil,

Thank you for getting back to me. From what you have written it appears you understood my post, thank you for parsing it out as I should have done in the post to begin with.

My first question I have now is, are the @max_time variables a different format than SLURM would use? I ask because you correctly stated that I set @max_time to 23:50:00 in my config file, however you then say this is the default but the line that you reference is @max_time 10-00 which in SLURM time would be 10 days, 0 hours. I am not sure how 10-00 is equivalent to 23:50:00.

Moving on to your suggestions, I placed a print in Constants.pm and it is correctly setting $JOB_TIMELIMIT to 23:50:00, however placing a print in cf after line 1029 to print the $time variable never occurs which would tell me that the script never sees a $time that is greater than $JOB_TIMELIMIT even though the very first job is still being set to 2-00:00:00. So maybe you are right in that @max_time is not working the way I would expect, and in that case is there a way to make sure a Job never gets a time limit that is more than 23 hours and 50 minutes?

Thank you again for getting back to me on this, Frankie

ewels commented 5 years ago

Hi Frankie,

Wow that was a long delay before my reply, my apologies. I was doing some inbox archaeology and found your mail, so here goes.

I am not sure how 10-00 is equivalent to 23:50:00.

It's not - sorry, I was just linking to the existing code where the default is 10 days. So if you have set your config file to have @max_time 23:50:00 then yes, that should be under 24 hours for slurm.

however placing a print in cf after line 1029 to print the $time variable never occurs

Ok, so mean that this if block is not being executed? I don't have any quick fixes for you sorry, nothing to do but to get down and dirty with the code and start picking apart each bit of that if statement and working back through the code to see where the logic is failing...

Sorry, not a very satisfactory answer I know. Basically as far as I can see, you're doing everything correctly and it should be working. I can't replicate the error, so I can't do the bug hunting for you, so you're kind of on your own.. 😞

Assuming that you didn't already give up, good luck and let me know how you get on / shout if there's anything I can help with!

Phil

ewels commented 5 years ago

ps. If you're only using Cluster Flow to download and dump SRA files, I'd recommend giving my newly updated SRA-explorer tool a spin: https://ewels.github.io/sra-explorer/

It now has direct links to FastQ files, courtesy of the ENA, and has copy+paste commands to use Aspera for super speedy downloads.

Phil