epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

Enable running --str (or other subcomponents of the pipeline) more modularly #169

Open ilivyatan opened 2 months ago

ilivyatan commented 2 months ago

Is your feature related to a problem?

Yes Maintaining consistency of analyses in a reasonable timeframe.

Describe the solution you'd like

I've ran the full pipeline on samples, but forgot to designate the --sex parameter for the --str analysis. So I want to just run the --str part again, and have it use the necessary inputs that it has already generated. Yet, running the pipeline again, designating only --str, starts rerunning the --snv analysis and haplotagging the BAM file, which takes a long time. It would be great if it could locate the files it needs in the 'output' folder and just run the specific analysis.

Describe alternatives you've considered

I've run the straglr independently. It takes less than 30 seconds for a 30x covered human genome... and another 15 min for phasing with longphase and annotation with stranger. This is a solution, but is less streamlined and doesn't produce the nice reports that epi2me does, and only some of the samples need to be repeated, so having a uniformity of analysis is important. Another alternative could be to enable the reporting tools as command line.

Additional context

I run epi2me via nextflow command line on the promethion24 machine. Since snv analysis is a precursor to the other types of analyses, maybe there can be an option to designate whether snv analysis has already run, and supply the result files, so that the pipeline can continue with additional analyses. For example, a routine could look like this: First run only --snv, check out the results, and then run again with --sv. (Phasing can be run at the end to connect everything up.)

RenzoTale88 commented 2 months ago

Hi @ilivyatan this should be intrisecally possible with nextflow if you keep your work directory by providing -resume. You can run the analysis again, adding or changing the parameters as you wish, and the workflow should be able to recognise what's been already run (in this case the --snp analysis) and simply execute the new steps. However, if you were to change the parameters for the --snp analysis, the workflow would have to repeat all or some of these steps as well.