ewels / clusterflow

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
https://ewels.github.io/clusterflow/
GNU General Public License v3.0
97 stars 27 forks source link

Add submitted job IDs to a log file #31

Closed FelixKrueger closed 9 years ago

FelixKrueger commented 9 years ago

Hi Phil, we recently had issues with CF runs failing for various reasons, probably mostly memory related issues. Once the jobs were killed there is no way of finding out what the job ID was and this made it almost impossible to find the folder (difficult to track when hundreds of single-cell RNA-Seqs are running at the same time...) or get information of what happened to the job using qacct -j <ID>. Something like this would be helpful:

Your job 259132 ("cf_bismark_1423231738_bismark_align_295") has been submitted Your job 259133 ("cf_bismark_1423231738_bismark_deduplicate_005") has been submitted Mmmkay?

ewels commented 9 years ago

Yes, but typically the job-submitting is done by the main cf executable in interactive mode, so this information would just be spat out to STDERR and not stored in a log file anywhere.

Secondly, to get the submitted job ID values (not just the job name) is a bit messy, as this will be different for every queue system. SLURM actually already prints this info to STDOUT when the jobs are submitted, so no change is needed there.

If you're running the dev version you can just use the --verbose option and it will print the full job submission commands (which include the job ID within them) to STDERR. Is this enough for you?

Is there really no way to find the IDs or names of killed jobs? That seems like the kind of thing that would end up in a cluster log file somewhere.. Surely they don't just get killed silently?

ewels commented 9 years ago

Would this help?

FelixKrueger commented 9 years ago

Can't you just print the text we see on screen to a Clusterflow log? That would already help... E.g.:

Your job 259132 ("cf_bismark_1423231738_bismark_align_295") has been submitted
Your job 259133 ("cf_bismark_1423231738_bismark_deduplicate_005") has been submitted
Your job 259134 ("cf_bismark_1423231738_bismark_methXtract_600") has been submitted
Your job 259135 ("cf_bismark_1423231738_bismark_report_584") has been submitted
Your job 259136 ("cf_bismark_1423231738_email_run_complete_610") has been submitted
ewels commented 9 years ago

I'm not sure what those status messages that you're posting are - they don't come from Cluster Flow (I don't think). Are you sure that they aren't being printed by SGE itself?

I guess I could try to capture this by piping STDERR and STDOUT to a new clusterflow command log file when it's fired. Hmm... Would need to have a play to figure out how to actually make that work.

FelixKrueger commented 9 years ago

Great, that's what I wanted to hear. Have a good weekend.

ewels commented 9 years ago

Yeah, actually this is just what I'm doing at the moment with the SLURM output actually, though I'm not writing it to file. I guess I could create a new submission_log.txt or something and print captured text to it.

ewels commented 9 years ago

Ok, all done @FelixKrueger - I can't test it on GRID Engine, but works nicely on SLURM. Let me know..

FelixKrueger commented 9 years ago

Thanks, will do!

ewels commented 9 years ago

@FelixKrueger - I also just added a little more, job submission commands are also appended to the end of the log file. Hope that helps.

ewels commented 9 years ago

@stu2 - this new functionality kind of duplicates and adds to what --verbose does. Would you mind if I remove --verbose for simplicity's sake, or would you rather I keep it?

stu2 commented 9 years ago

Hi Phil, Yes that's fine with me, --verbose isn't really a core part of the pipeline.

ewels commented 9 years ago

@stu2 - along a similar line, I really like your idea about printing software versions to logs. Now I'm thinking about removing --print_versions and just getting all modules to always print the software versions to the log when they run.

I can't think of any disadvantage to having this there, other than sometimes making the log file a bit long and unwieldy (could maybe try to clean some of the longer outputs by searching for version number). What do you think?

stu2 commented 9 years ago

Yeah, it's not elegant but its better to have the info in there than not. A parser to filter the version numbers from the help screens past, present and future would be useful, but tricky to code!

ewels commented 9 years ago

So I've just gone through all of the modules updating this. They now all print to STDERR after the module headers are printed, so all should be in the log files.

I had a look at each version call as I went, most were actually fairly concise. For the longer ones, I just added a 2>&1 | head -n 5 (or something similar) to the call, instead of actual parsing. In each module that I did this with, I loaded the oldest version that we have in our modules system to check that it still worked.