ewels / clusterflow

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
https://ewels.github.io/clusterflow/
GNU General Public License v3.0
97 stars 27 forks source link

Versions #21

Closed stu2 closed 9 years ago

stu2 commented 9 years ago

Hi, Phil, I wanted to include the version numbers of the programs called by clusterflow in the run report file for --verbose mode, so I edited the cfmod files for samtools, bowtie etc. to output this to the clusterflow report file if report_v is in the params (--verbose now adds report_v to the params fed to all modules). For some programs it's impossible to get version info without also getting the help page, I could have pared this down (e.g. htseq-count | tail -n 3) but I chose not to because we can't predict the format of future versions of their help pages.

Anyway I also noticed that the report file for some reason doesn't recognise some of its own output, from the email scheduling cfmod it looks like. It's not a serious issue. I tagged unrecognised output with arrows in the report file to highlight it too.

stu2 commented 9 years ago

whoops authored these under a different username..

ewels commented 9 years ago

Hi @stu2 - you should be able to amend the commits using this github script, I think if you update locally and then push then the PR will be automatically updated, no need to close it and reopen.

stu2 commented 9 years ago

Thanks, but don't worry about it, I've simply incorporated the new email addresses into my profile. Re-opened.

ewels commented 9 years ago

Hey @stu2 - just a quick note to say that I haven't forgotten this, just swamped with some other urgent stuff. I'm hoping to have a proper sit down with Cluster Flow soon, hopefully in a couple of weeks' time.

Phil

stu2 commented 9 years ago

Hi Phil, no problem, I'm facing similar. Cheers, Stuart

ewels commented 9 years ago

Hi @stu2,

I've just had a quick read through your changes.. I like the idea of being able to print the version numbers of software at run time, this is something I've been thinking of implementing myself for a while. Is there a reason that you need to do this with parameters / in the main execution though? As it's a one off I'd be more inclined to do it as a command line option to the module in the same way that memory and CPU requirements are requested. Then the main cf script can pull this information when it launches and the version code is kept separate from the execution code.. Does that make sense?

I mostly don't like messing with the main command that is executed - I'd rather keep that as minimal and clean as possible rather than mixing in multiple functionalities..

Let me know if you have a reason not to do it like this - if not, I'll merge your code and then edit it to follow the alternate flow described above if that's ok.

I knew about the unrecognised output, thanks for the improved labelling though. I can't remember the specifics off the top of my head, but it's usually something to do with one of the final finishing modules adding to the log file after it's been parsed, or without the module name prefix or something. I'll add it as an issue and look into it so that it's not generated at all (though good to keep the highlighting).

Phil

ewels commented 9 years ago

Hey @stu2,

I'm hoping to have a crack at some Cluster Flow updates next week. If you're busy I'll just pick and choose from your changes here if that's ok, otherwise feel free to let me know what you think about the points above.

Cheers,

Phil

stu2 commented 9 years ago

Hi Phil, I agree with what you say about the version number reporting, to do it before the main job if possible. That would allow the module code to be segregated into a function for generating the 'version' call and the main body, and make it more straightforward to write new modules.

ewels commented 9 years ago

Great! I'll merge and then update so that it works as discussed above.

ewels commented 9 years ago

Hi @stu2 - I've tried to rework your code as discussed in the above commit. If you manage any testing that would be great!

Comments welcome.

Cheers,

Phil