connor-lab / ncov2019-artic-nf

A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019
GNU Affero General Public License v3.0
89 stars 89 forks source link

Nanopolish Pipeline keeps failing at `artic_plot_amplicon_depth` step #47

Closed DarianHole closed 4 years ago

DarianHole commented 4 years ago

Hi,

I feel like there is something wrong with my input data but I cannot seem to figure out what it is after searching and looking through the logs. Currently I keep getting the following:

Running: artic_vcf_filter --nanopolish output_prefix_barcode13.merged.vcf output_prefix_barcode13.pass.vcf xing fast5_pass/barcode12
  [readdb] indexing fast5_pass/barcode18
  [readdb] indexing fast5_pass/barcode14
  [readdb] indexing fast5_pass/barcode08
  [readdb] indexing fast5_pass/barcode05
  [readdb] indexing fast5_pass/barcode09
  [readdb] indexing fast5_pass/barcode19
  [readdb] indexing fast5_pass/barcode04
  [readdb] indexing fast5_pass/barcode01
  [readdb] num reads: 21100, num reads with path to fast5: 21100
  [M::mm_idx_gen::0.009*0.26] collected minimizers
  [M::mm_idx_gen::0.010*0.35] sorted minimizers
  [M::main::0.010*0.35] loaded/built the index for 1 target sequence(s)
  [M::mm_mapopt_update::0.010*0.36] mid_occ = 3
  [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
  [M::mm_idx_stat::0.010*0.37] distinct minimizers: 5587 (99.93% are singletons); average occurrences: 1.004; average spacing: 5.332
  [M::worker_pipeline::4.942*0.89] mapped 21100 sequences
  [M::main] Version: 2.17-r941
  [M::main] CMD: minimap2 -a -x map-ont -t 1 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta output_prefix_barcode13.fastq
  [M::main] Real time: 4.946 sec; CPU: 4.384 sec; Peak RSS: 0.035 GB
  [post-run summary] total reads: 10598, unparseable: 0, qc fail: 10, could not calibrate: 3, no alignment: 47, bad fast5: 0
  [post-run summary] total reads: 8630, unparseable: 0, qc fail: 7, could not calibrate: 6, no alignment: 40, bad fast5: 0
  Traceback (most recent call last):
    File "/home/CSCScience.ca/dhole/nextflow/work/conda/artic-ncov2019-9390089492b7a53aa23717df85cacd62/bin/artic_plot_amplicon_depth", line 8, in <module>
      sys.exit(main())
    File "/home/CSCScience.ca/dhole/nextflow/work/conda/artic-ncov2019-9390089492b7a53aa23717df85cacd62/lib/python3.6/site-packages/artic/plot_amplicon_depth.py", line 143, in main
      go(args)
    File "/home/CSCScience.ca/dhole/nextflow/work/conda/artic-ncov2019-9390089492b7a53aa23717df85cacd62/lib/python3.6/site-packages/artic/plot_amplicon_depth.py", line 84, in go
      x=df['position'], bins=starts, labels=amplicons)
    File "/home/CSCScience.ca/dhole/nextflow/work/conda/artic-ncov2019-9390089492b7a53aa23717df85cacd62/lib/python3.6/site-packages/pandas/core/reshape/tile.py", line 228, in cut
      raise ValueError('bins must increase monotonically.')
  ValueError: bins must increase monotonically.
  Running: nanopolish index -s sequencing_summary_FAK48225_b6fcf7e3.txt -d fast5_pass output_prefix_barcode13.fastq
  Running: minimap2 -a -x map-ont -t 1 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta output_prefix_barcode13.fastq | samtools view -bS -F 4 - | samtools sort -o output_prefix_barcode13.sorted.bam -
  Running: samtools index output_prefix_barcode13.sorted.bam
  Running: align_trim --start --normalise 500 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed --report output_prefix_barcode13.alignreport.txt < output_prefix_barcode13.sorted.bam 2> output_prefix_barcode13.alignreport.er | samtools sort -T output_prefix_barcode13 - -o output_prefix_barcode13.trimmed.rg.sorted.bam
  Running: align_trim --normalise 500 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed --remove-incorrect-pairs --report output_prefix_barcode13.alignreport.txt < output_prefix_barcode13.sorted.bam 2> output_prefix_barcode13.alignreport.er | samtools sort -T output_prefix_barcode13 - -o output_prefix_barcode13.primertrimmed.rg.sorted.bam
  Running: samtools index output_prefix_barcode13.trimmed.rg.sorted.bam
  Running: samtools index output_prefix_barcode13.primertrimmed.rg.sorted.bam
  Running: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 1 --reads output_prefix_barcode13.fastq -o output_prefix_barcode13.nCoV-2019_1.vcf -b output_prefix_barcode13.trimmed.rg.sorted.bam -g scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_1 
  Running: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 1 --reads output_prefix_barcode13.fastq -o output_prefix_barcode13.nCoV-2019_2.vcf -b output_prefix_barcode13.trimmed.rg.sorted.bam -g scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_2 
  Running: artic_vcf_merge output_prefix_barcode13 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed nCoV-2019_1:output_prefix_barcode13.nCoV-2019_1.vcf nCoV-2019_2:output_prefix_barcode13.nCoV-2019_2.vcf
  Running: artic_vcf_filter --nanopolish output_prefix_barcode13.merged.vcf output_prefix_barcode13.pass.vcf output_prefix_barcode13.fail.vcf
  Running: artic_make_depth_mask --store-rg-depths scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta output_prefix_barcode13.primertrimmed.rg.sorted.bam output_prefix_barcode13.coverage_mask.txt
  Running: artic_plot_amplicon_depth --primerScheme scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed --sampleID output_prefix_barcode13 --outFilePrefix output_prefix_barcode13 output_prefix_barcode13*.depths
  Command failed:artic_plot_amplicon_depth --primerScheme scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed --sampleID output_prefix_barcode13 --outFilePrefix output_prefix_barcode13 output_prefix_barcode13*.depths

Exit code is 20 and the exact command it is failing on it artic minion --normalise 500 --minimap2 --threads 1 --scheme-directory scheme/primer_schemes --read-file output_prefix_barcode13.fastq --fast5-directory fast5_pass --sequencing-summary sequencing_summary_FAK48225_b6fcf7e3.txt nCoV-2019/V2 output_prefix_barcode13

My guess is I've done something wrong with the guppy_barcoder command so I'll look into it that way

Thanks for any help or advice you can give me! Darian

m-bull commented 4 years ago

Hi @DarianHole - this should be fixed by https://github.com/connor-lab/ncov2019-artic-nf/commit/8b0f5ea06f94a8f7235613c20fc371c886f695b9.

DarianHole commented 4 years ago

Oh thanks @m-bull ! I'll test it out! Great workflow

DarianHole commented 4 years ago

Works now, thanks again!

iferres commented 4 years ago

Hi, I'm getting a similar error:

Error executing process > 'articNcovNanopore:sequenceAnalysisNanopolish:articMinIONNanopolish (guppy3.6_110520_barcode03)'

Caused by:
  Process `articNcovNanopore:sequenceAnalysisNanopolish:articMinIONNanopolish (guppy3.6_110520_barcode03)` terminated with an error exit status (20)

Command executed:

  artic minion --normalise 500 --minimap2     --threads 1     --scheme-directory scheme/primer_schemes     --read-file guppy3.6_110520_barcode03.fastq     --fast5-directory fast5_pass     --sequencing-sum
mary sequencing_summary.txt     nCoV-2019/V2     guppy3.6_110520_barcode03

Command exit status:
  20

Command output:
  (empty)

Command error:
  [readdb] indexing fast5_pass
  [readdb] num reads: 140498, num reads with path to fast5: 140498
  [M::mm_idx_gen::0.003*2.51] collected minimizers
  [M::mm_idx_gen::0.005*1.92] sorted minimizers
  [M::main::0.005*1.92] loaded/built the index for 1 target sequence(s)
  [M::mm_mapopt_update::0.006*1.86] mid_occ = 3
  [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
  [M::mm_idx_stat::0.006*1.82] distinct minimizers: 5587 (99.93% are singletons); average occurrences: 1.004; average spacing: 5.332
  [M::worker_pipeline::40.111*0.91] mapped 140498 sequences
  [M::main] Version: 2.17-r941
  [M::main] CMD: minimap2 -a -x map-ont -t 1 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta guppy3.6_110520_barcode03.fastq
  [M::main] Real time: 40.250 sec; CPU: 36.441 sec; Peak RSS: 0.189 GB
  [post-run summary] total reads: 73910, unparseable: 0, qc fail: 31, could not calibrate: 2, no alignment: 109, bad fast5: 0
  [post-run summary] total reads: 71208, unparseable: 0, qc fail: 24, could not calibrate: 3, no alignment: 109, bad fast5: 0
  Traceback (most recent call last):
    File "/opt/conda/bin/artic_plot_amplicon_depth", line 10, in <module>
      sys.exit(main())
    File "/opt/conda/lib/python3.6/site-packages/artic/plot_amplicon_depth.py", line 143, in main
      go(args)
    File "/opt/conda/lib/python3.6/site-packages/artic/plot_amplicon_depth.py", line 84, in go
      x=df['position'], bins=starts, labels=amplicons)
    File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/tile.py", line 260, in cut
      raise ValueError("bins must increase monotonically.")
  ValueError: bins must increase monotonically.
  Running: nanopolish index -s sequencing_summary.txt -d fast5_pass guppy3.6_110520_barcode03.fastq
  Running: minimap2 -a -x map-ont -t 1 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta guppy3.6_110520_barcode03.fastq | samtools view -bS -F 4 - | samtools sort -o guppy3.6_110520_barcode03.sorted.bam -
  Running: samtools index guppy3.6_110520_barcode03.sorted.bam
  Running: align_trim --start --normalise 500 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed --report guppy3.6_110520_barcode03.alignreport.txt < guppy3.6_110520_barcode03.sorted.bam 2> guppy3.6_110520_barcode03.alignreport.er | samtools sort -T guppy3.6_110520_barcode03 - -o guppy3.6_110520_barcode03.trimmed.rg.sorted.bam
  Running: align_trim --normalise 500 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed --remove-incorrect-pairs --report guppy3.6_110520_barcode03.alignreport.txt < guppy3.6_110520_barcode03.sorted.bam 2> guppy3.6_110520_barcode03.alignreport.er | samtools sort -T guppy3.6_110520_barcode03 - -o guppy3.6_110520_barcode03.primertrimmed.rg.sorted.bam
  Running: samtools index guppy3.6_110520_barcode03.trimmed.rg.sorted.bam
  Running: samtools index guppy3.6_110520_barcode03.primertrimmed.rg.sorted.bam
  Running: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 1 --reads guppy3.6_110520_barcode03.fastq -o guppy3.6_110520_barcode03.nCoV-2019_2.vcf -b guppy3.6_110520_barcode03.trimmed.rg.sorted.bam -g scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_2
  Running: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 1 --reads guppy3.6_110520_barcode03.fastq -o guppy3.6_110520_barcode03.nCoV-2019_1.vcf -b guppy3.6_110520_barcode03.trimmed.rg.sorted.bam -g scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_1
  Running: artic_vcf_merge guppy3.6_110520_barcode03 scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed nCoV-2019_2:guppy3.6_110520_barcode03.nCoV-2019_2.vcf nCoV-2019_1:guppy3.6_110520_barcode03.nCoV-2019_1.vcf
  Running: artic_vcf_filter --nanopolish guppy3.6_110520_barcode03.merged.vcf guppy3.6_110520_barcode03.pass.vcf guppy3.6_110520_barcode03.fail.vcf
  Running: artic_make_depth_mask --store-rg-depths scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.reference.fasta guppy3.6_110520_barcode03.primertrimmed.rg.sorted.bam guppy3.6_110520_barcode03.coverage_mask.txt
  Running: artic_plot_amplicon_depth --primerScheme scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed --sampleID guppy3.6_110520_barcode03 --outFilePrefix guppy3.6_110520_barcode03 guppy3.6_110520_barcode03*.depths
  Command failed:artic_plot_amplicon_depth --primerScheme scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed --sampleID guppy3.6_110520_barcode03 --outFilePrefix guppy3.6_110520_barcode03 guppy3.6_110520_barcode03*.depths

Work dir:
  /mnt/ubi/iferres/covid19/prueba/work2/75/4c0c76414f294aa76e95002e9adebb

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
[iferres@nagual prueba]$ [1;2CDD

I have updated the nextflow pipeline, but still getting that error. Any advice? Thanks

DarianHole commented 4 years ago

Mine started failing again in the same spot yesterday after working on Wednesday so I'm not sure what happened. Potentially the conda env update had an issue

iferres commented 4 years ago

It must be something with the pandas. It's throwing an error as if a file doesn't exists, but it actually does.

singularity exec /mnt/ubi/iferres/covid19/artic.sif artic_plot_amplicon_depth --primerScheme scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed --sampleID guppy3.6_110520_barcode03 --outFilePrefix guppy3.6_110520_barcode03 guppy3.6_110520_barcode03*.depths
Traceback (most recent call last):
  File "/opt/conda/bin/artic_plot_amplicon_depth", line 10, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.6/site-packages/artic/plot_amplicon_depth.py", line 143, in main
    go(args)
  File "/opt/conda/lib/python3.6/site-packages/artic/plot_amplicon_depth.py", line 29, in go
    primerScheme = read_bed_file(args.primerScheme)
  File "/opt/conda/lib/python3.6/site-packages/artic/vcftagprimersites.py", line 88, in read_bed_file
    skiprows=0)
  File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 1917, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 689, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed' does not exist: b'scheme/primer_schemes/nCoV-2019/V2/nCoV-2019.scheme.bed'

The singularity image was built today.

m-bull commented 4 years ago

Hi @iferres , @DarianHole - I've chased this error and this error to the way that the recently released artic v1.1.1 parses the V2 primer scheme, which is the one that this workflow uses by default. There is a fix incoming from the ARTIC project for this, hopefully within the next few days. I'll update here when this happens.

This, I can't reproduce, but am happy to investigate further when ARTIC patch fieldbioinformatics.

iferres commented 4 years ago

I think the last one is the same error. From the nextflow log I understand that the artic_plot_amplicon_depth command is the one that is failing. I just called the command that failed from the singularity container directly. Thanks a lot!

m-bull commented 4 years ago

Progress here.

DarianHole commented 4 years ago

After playing around with it it also seems that specifying artic=1.1.0 in the conda config or installing that version of artic works too for the moment if you need to run data

will-rowe commented 4 years ago

Hi all - sorry about this issue. New patch version is up in conda (1.1.2) and should fix this

will-rowe commented 4 years ago

The artic nCov repo has now had the version bumped too.

DarianHole commented 4 years ago

Thanks again both of you, that fixes the issue!