genomic-medicine-sweden / gms-artic

A nextflow pipeline with a GMS touch for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics).
GNU Affero General Public License v3.0
8 stars 6 forks source link

medaka worflow fails if you have few reads #77

Open fwa93 opened 1 year ago

fwa93 commented 1 year ago

Hi. There is a problem with the medaka workflow. If I remove barcode71 (a really bad sample), the pipeline finish. If I keep it, the pipeline krasches. It seems like longshot with the -A flag does not accept when the coverage is 0. See a similar issue here -> https://github.com/artic-network/fieldbioinformatics/issues/91

nextflow run main.nf -profile singularity --medaka --prefix "full_test1" --basecalled_fastq 23v17_Sars-cov2/no_sample/20230427_1424_X1_FAV87849_b991bac3/fastq_pass/ --outdir full_test_results --scheme midnight-primer --schemeVersion V1 N E X T F L O W ~ version 20.10.0 Launching main.nf [determined_wescoff] - revision: 4b2eb4a204 WARN: DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE executor > local (182) [25/8bc07c] process > articNcovNanopore:sequenceAnalysisMedaka:versions [100%] 1 of 1 ✔ [2c/4a21a0] process > articNcovNanopore:sequenceAnalysisMedaka:pangoversions [100%] 1 of 1 ✔ [41/19da2b] process > articNcovNanopore:sequenceAnalysisMedaka:fastqcNanopore (46) [ 98%] 46 of 47 [69/ac0778] process > articNcovNanopore:sequenceAnalysisMedaka:multiqcNanopore (46) [ 98%] 45 of 46 [ac/9b0bba] process > articNcovNanopore:sequenceAnalysisMedaka:articDownloadScheme (https://github.com/genomic-medicine-sweden/gms-art... [100%] 1 of 1 ✔ [5c/00539b] process > articNcovNanopore:sequenceAnalysisMedaka:articGuppyPlex (full_test1-barcode60) [ 85%] 40 of 47 [7a/ebc876] process > articNcovNanopore:sequenceAnalysisMedaka:articMinIONMedaka (full_test1_barcode63) [ 0%] 0 of 39 [- ] process > articNcovNanopore:sequenceAnalysisMedaka:articRemoveUnmappedReads - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:makeQCCSV - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:writeQCSummaryCSV - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:collateSamples - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:nextclade - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:pangolinTyping - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:makeReport - Error executing process > 'articNcovNanopore:sequenceAnalysisMedaka:articMinIONMedaka (full_test1_barcode71)'

Caused by: Process articNcovNanopore:sequenceAnalysisMedaka:articMinIONMedaka (full_test1_barcode71) terminated with an error exit status (20)

Command executed:

executor > local (182) [25/8bc07c] process > articNcovNanopore:sequenceAnalysisMedaka:versions [100%] 1 of 1 ✔ [2c/4a21a0] process > articNcovNanopore:sequenceAnalysisMedaka:pangoversions [100%] 1 of 1 ✔ [56/707da2] process > articNcovNanopore:sequenceAnalysisMedaka:fastqcNanopore (43) [100%] 46 of 46 [69/ac0778] process > articNcovNanopore:sequenceAnalysisMedaka:multiqcNanopore (46) [ 98%] 45 of 46 [ac/9b0bba] process > articNcovNanopore:sequenceAnalysisMedaka:articDownloadScheme (https://github.com/genomic-medicine-sweden/gms-art... [100%] 1 of 1 ✔ [f3/4f282b] process > articNcovNanopore:sequenceAnalysisMedaka:articGuppyPlex (full_test1-barcode70) [100%] 40 of 40 [29/b56259] process > articNcovNanopore:sequenceAnalysisMedaka:articMinIONMedaka (full_test1_barcode33) [ 6%] 1 of 18, failed: 1 [- ] process > articNcovNanopore:sequenceAnalysisMedaka:articRemoveUnmappedReads - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:makeQCCSV - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:writeQCSummaryCSV - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:collateSamples - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:nextclade - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:pangolinTyping - [- ] process > articNcovNanopore:sequenceAnalysisMedaka:makeReport - Error executing process > 'articNcovNanopore:sequenceAnalysisMedaka:articMinIONMedaka (full_test1_barcode71)'

Caused by: Process articNcovNanopore:sequenceAnalysisMedaka:articMinIONMedaka (full_test1_barcode71) terminated with an error exit status (20)

Command executed:

artic minion --medaka --normalise 500 --minimap2 --threads 1 --scheme-directory gms-artic --read-file full_test1_barcode71.fastq midnight-primer/V1 full_test1_barcode71

Command exit status: 20

Command output: error: {} ERROR: Max read coverage set to 0. printing empty VCF file

Command error: Running: samtools view -b -r "nCoV-2019_2" full_test1_barcode71.primertrimmed.rg.sorted.bam > full_test1_barcode71.primertrimmed.nCoV-2019_2.sorted.bam Running: samtools index full_test1_barcode71.primertrimmed.nCoV-2019_2.sorted.bam Running: samtools view -b -r "nCoV-2019_1" full_test1_barcode71.primertrimmed.rg.sorted.bam > full_test1_barcode71.primertrimmed.nCoV-2019_1.sorted.b] Initializing data loader [17:11:17 - PWorker] Running inference for 0.0M draft bases. [17:11:17 - Sampler] Initializing sampler for consensus of region MN908947.3:0-29903. [17:11:17 - Sampler] Took 0.00s to make features. [17:11:18 - PWorker] All done, 0 remainder regions. [17:11:18 - Predict] Finished processing all regions. [17:11:21 - DataIndex] Loaded 1/1 (100.00%) sample files. [17:11:24 - Predict] Processing region(s): MN908947.3:0-29903 [17:11:24 - Predict] Setting tensorflow threads to 1. [17:11:24 - Predict] Processing 1 long region(s) with batching. [17:11:24 - Predict] Using model: /opt/conda/envs/artic/lib/python3.6/site-packages/medaka/data/r941_min_high_g360_model.hdf5. [17:11:24 - ModelLoad] Building model with cudnn optimization: False [17:11:25 - DLoader] Initializing data loader [17:11:25 - PWorker] Running inference for 0.0M draft bases. [17:11:25 - Sampler] Initializing sampler for consensus of region MN908947.3:0-29903. [17:11:25 - Feature] Pileup counts do not span requested region, requested MN908947.3:0-29903, received 28699-29506. [17:11:25 - Feature] Processed MN908947.3:28699.0-29506.0 (median depth 1.0) [17:11:25 - Sampler] Took 0.01s to make features. [17:11:26 - PWorker] All done, 0 remainder regions. [17:11:26 - Predict] Finished processing all regions. [17:11:29 - DataIndex] Loaded 1/1 (100.00%) sample files. [17:11:29 - Variants] Processing MN908947.3:0-.

2023-05-10 17:11:31 Automatically determining max read coverage. 2023-05-10 17:11:31 Estimating mean read coverage... 2023-05-10 17:11:31 WARNING: Max coverage calculation is highly likely to be incorrect. The number of reference bases covered by the bam file (808) differs significantly from the expected number of positions in the reference (29903). If you are using a bam file that only covers part of the genome, please specify this region exactly with the --region argument so the number of reference bases is known. Alternatively, disable maximum coverage filtering by setting -C to a large number. 2023-05-10 17:11:31 Total reference positions: 29903 2023-05-10 17:11:31 Total bases in bam: 808 2023-05-10 17:11:31 Mean read coverage: 0.03 Running: minimap2 -a -x map-ont -t 1 gms-artic/midnight-primer/V1/midnight-primer.reference.fasta full_test1_barcode71.fastq | samtools view -bS -F 4 - | samtools sort -o full_test1_barcode71.sorted.bam - Running: samtools index full_test1_barcode71.sorted.bam Running: align_trim --start --normalise 500 gms-artic/midnight-primer/V1/midnight-primer.scheme.bed --report full_test1_barcode71.alignreport.txt < full_test1_barcode71.sorted.bam 2> full_test1_barcode71.alignreport.er | samtools sort -T full_test1_barcode71 - -o full_test1_barcode71.trimmed.rg.sorted.bam Running: align_trim --normalise 500 gms-artic/midnight-primer/V1/midnight-primer.scheme.bed --remove-incorrect-pairs --report full_test1_barcode71.alignreport.txt < full_test1_barcode71.sorted.bam 2> full_test1_barcode71.alignreport.er | samtools sort -T full_test1_barcode71 - -o full_test1_barcode71.primertrimmed.rg.sorted.bam Running: samtools index full_test1_barcode71.trimmed.rg.sorted.bam Running: samtools index full_test1_barcode71.primertrimmed.rg.sorted.bam Running: samtools view -b -r "nCoV-2019_2" full_test1_barcode71.primertrimmed.rg.sorted.bam > full_test1_barcode71.primertrimmed.nCoV-2019_2.sorted.bam Running: samtools index full_test1_barcode71.primertrimmed.nCoV-2019_2.sorted.bam Running: samtools view -b -r "nCoV-2019_1" full_test1_barcode71.primertrimmed.rg.sorted.bam > full_test1_barcode71.primertrimmed.nCoV-2019_1.sorted.bam Running: samtools index full_test1_barcode71.primertrimmed.nCoV-2019_1.sorted.bam Running: medaka consensus --chunk_len 800 --chunk_ovlp 400 full_test1_barcode71.primertrimmed.nCoV-2019_2.sorted.bam full_test1_barcode71.nCoV-2019_2.hdf Running: medaka variant gms-artic/midnight-primer/V1/midnight-primer.reference.fasta full_test1_barcode71.nCoV-2019_2.hdf full_test1_barcode71.nCoV-2019_2.vcf Running: medaka consensus --chunk_len 800 --chunk_ovlp 400 full_test1_barcode71.primertrimmed.nCoV-2019_1.sorted.bam full_test1_barcode71.nCoV-2019_1.hdf Running: medaka variant gms-artic/midnight-primer/V1/midnight-primer.reference.fasta full_test1_barcode71.nCoV-2019_1.hdf full_test1_barcode71.nCoV-2019_1.vcf Running: artic_vcf_merge full_test1_barcode71 gms-artic/midnight-primer/V1/midnight-primer.scheme.bed nCoV-2019_2:full_test1_barcode71.nCoV-2019_2.vcf nCoV-2019_1:full_test1_barcode71.nCoV-2019_1.vcf Running: bgzip -f full_test1_barcode71.merged.vcf Running: tabix -p vcf full_test1_barcode71.merged.vcf.gz Running: longshot -P 0 -F -A --no_haps --bam full_test1_barcode71.primertrimmed.rg.sorted.bam --ref gms-artic/midnight-primer/V1/midnight-primer.reference.fasta --out full_test1_barcode71.longshot.vcf --potential_variants full_test1_barcode71.merged.vcf.gz Command failed:longshot -P 0 -F -A --no_haps --bam full_test1_barcode71.primertrimmed.rg.sorted.bam --ref gms-artic/midnight-primer/V1/midnight-primer.reference.fasta --out full_test1_barcode71.longshot.vcf --potential_variants full_test1_barcode71.merged.vcf.gz

Work dir: /aux/db/gms-artic/work/71/6f2fcada2474605bd307b6decbf047

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

JD2112 commented 1 year ago

@fwa93 #78 looks like longshot installation problem. In the container (environment.yaml), we need to add longshot module from Conda. Could you please check locally if it works? I don't have any Medaka data to test.