genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
106 stars 49 forks source link

medaka fails #61

Open vdejager opened 2 years ago

vdejager commented 2 years ago

I'm getting the following error when running NanoClust on our (bonito called, fastq) reads using singularity on our cluster (thanks @Thomieh73) Medaka is failing. I have not figured out yet what the issue is, but the racon failsafe also fails.


Traceback (most recent call last):
    File "/opt/conda/envs/medaka_pass/bin/medaka", line 11, in <module>
      sys.exit(main())
    File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/medaka.py", line 643, in main
      args.func(args)
    File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/stitch.py", line 141, in stitch
      for contigs in executor.map(worker, regions):
    File "/opt/conda/envs/medaka_pass/lib/python3.6/concurrent/futures/process.py", line 496, in map
      timeout=timeout)
    File "/opt/conda/envs/medaka_pass/lib/python3.6/concurrent/futures/_base.py", line 575, in map
      fs = [self.submit(fn, *args) for args in zip(*iterables)]
    File "/opt/conda/envs/medaka_pass/lib/python3.6/concurrent/futures/_base.py", line 575, in <listcomp>
      fs = [self.submit(fn, *args) for args in zip(*iterables)]
    File "/opt/conda/envs/medaka_pass/lib/python3.6/concurrent/futures/process.py", line 139, in _get_chunks
      chunk = tuple(itertools.islice(it, chunksize))
    File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/common.py", line 611, in grouper
      batch.append(next(gen))
    File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/stitch.py", line 134, in <genexpr>
      (common.Region.from_string(r) for r in args.regions),
    File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/common.py", line 476, in from_string
      start, end = [int(b) for b in bounds.split('-')]
    File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/common.py", line 476, in <listcomp>
      start, end = [int(b) for b in bounds.split('-')]
  ValueError: invalid literal for int() with base 10: '1541('
  .command.sh: line 5: consensus_medaka.fasta: Is a directory

Work dir:
    /gpfs/work2/0/lwc2020006/software/NanoCLUST/work/3e/c15a85104608f5d329cb74a9e321a0

The work directory actually contains a directory consensus_medaka.fasta with the following file (probably from medaka ./consensus_medaka.fasta ├── calls_to_draft.bam ├── calls_to_draft.bam.bai ├── consensus.fasta └── consensus_probs.hdf

However, due to the failing reads the shellscript below also fails.

#!/bin/bash -euo pipefail
if medaka_consensus -i corrected_reads.correctedReads.fasta -d racon_consensus.fasta -o consensus_medaka.fasta -t 4 -m r941_min_high_g303 ; then
   echo "Command succeeded"
else
   cat racon_consensus.fasta > consensus_medaka.fasta
fi

This is defined in main.nf in https://github.com/genomicsITER/NanoCLUST/blob/master/main.nf#L408

Any ideas how we can solve this issue?