epam / fonda

Fonda is a framework which offers scalable and automatic analysis of multiple NGS sequencing data types
Apache License 2.0
8 stars 2 forks source link

Cohort analysis hangs #173

Closed syansanofi closed 3 years ago

syansanofi commented 4 years ago

In DNACapture workflow, qc_summary_cohort_analysis job hangs instead of proceeding when all samples are finished with post alignment jobs. Most likely, this is due to mismatch between job log string and the search pattern used in the cohort analysis job script.

Example of cohort analysis search pattern:

str=$(grep -Ei "((Error Step: (Seqpurge trimming|Novoalign alignment|Index bam|Merge DNA bams|Mark duplicates|Index mkdup bam|Remove duplicates|Index rmdup bam|DNA QC metrics|Merge DNA QC|ABRA realignment|Vardict detection|SnpEff annotation|Remove temporary directories|Run vardict|QC summary analysis))|(Successful Step: SnpEff annotation))" $logFile;)

Example of individual sample log (last 4 lines):
Mon Oct 19 11:41:14 UTC 2020 Successful Step: Run vardict. Mon Oct 19 11:41:22 UTC 2020 Begin Step: Remove temporary directories... Mon Oct 19 11:41:34 UTC 2020 Successful Step: Remove temporary directories. Mon Oct 19 11:41:42 UTC 2020 Finish the job execution!

kamyshova commented 4 years ago

@syansanofi Hi, Shu. Hangs in cohort analysis are a common problem due to the checking approach of the completion of workflow steps. Regarding this case. As I can see, the problem is the following cohort analysis search pattern:

str=$(grep -Ei "((Error Step: (Seqpurge trimming|Novoalign alignment|Index bam|Merge DNA bams|Mark duplicates|Index mkdup bam|Remove duplicates|Index rmdup bam|DNA QC metrics|Merge DNA QC|ABRA realignment|Vardict detection|SnpEff annotation|Remove temporary directories|Run vardict))|(Successful Step: Merge DNA QC))" $logFile;)

Merge DNA QC task is included in workflow if LibraryType is set in study config (_exome/target/WES/WEX/IDT17genesPanel,etc) and qc tool is included in toolset. qc cohort analysis adds this tag for a check just if qc tool is included. So we can also add a check LibraryType in qc analysis. Or if we don't want to change the initial logic you should set the appropriate LibraryType. What do you think?

syansanofi commented 4 years ago

@kamyshova Yes, I agree completely. Adding check would help to prevent execution until LibraryType is checked.