genomic-medicine-sweden / Twist_Solid

Pipeline for Solid tumours
https://twist-solid.readthedocs.io
GNU General Public License v3.0
3 stars 6 forks source link

Handling of samples with too few reads (or no reads at all) #463

Open jfr019 opened 3 months ago

jfr019 commented 3 months ago

Is your feature request related to a problem? Please describe

The pipeline stops when analysing a batch of samples has a sample with very few or no reads. This causes a problem when starting the pipeline automatically, without inspecting the input data, as soon as the sequencer has finished.

Some rules and programs can handle missing input while others can't.

Error messages came from the following rules:

Error in rule fusions_fuseq_wes: Error in rule cnv_sv_gatk_denoise_read_counts: Error in rule cnv_sv_manta_run_workflow_t: Error in rule cnv_sv_purecn_coverage: Error in rule biomarker_cnvkit2scarhrd: Error in rule cnv_sv_cnvkit_vcf: Error in rule annotation_vep: Error in rule annotation_vep_wo_pick:

Describe the solution you'd like

All rules should handle missing input without the pipeline stopping.

Describe alternatives you've considered

Add error handling for programs that exit with an error exit status.

Additional context

maehler commented 2 months ago

What I have been doing is to run the pipeline with the snakemake flag --keep-going. Then all branches of the DAG that don't result in an error can continue all the way to the end, and ideally only the affected sample(s) will have missing files.

Perhaps we could have some kind of output, like a simple tsv/yaml/json, with the status of each sample after completion of the pipeline. That would make it easier and less error prone to figure out which sample(s) failed.