Error executing process > read_correction

Hello!

I have been trying out NanoCLUST for 16S analysis of a few of my data and it has been wonderful so far. But, a few datasets always abort during read_correction.

Even though I have 47 GB of RAM and a 24 CPU machine, I thought maybe there were too many data so I played around with --umap_set_size and --polishing_reads but the same Error executing process > read_correction error comes up with an error exit status (1).

I'll post the whole error along with the information I got from running .command.sh in the workdir. I don't think it is anything wrong with Canu itself as I was able to complete the pipeline on most of my datas.

The command that I used:

nextflow run main.nf -profile docker --reads 'part3/10000_lq8BC03.fastq' --db "db/16S_ribosomal_RNA" --tax "db/taxdb/" --outdir "results/part3" --umap_set_size 25000 --polishing_reads 50

NanoCLUST

Error executing process > 'read_correction (7)'

Caused by:
  Process `read_correction (7)` terminated with an error exit status (1)

Command executed:

  head -n$(( 100*4 )) 6.fastq > subset.fastq
  canu -correct -p corrected_reads -nanopore-raw subset.fastq genomeSize=1.5k stopOnLowCoverage=1 minInputCoverage=2 minReadLength=500 minOverlapLength=200
  gunzip corrected_reads.correctedReads.fasta.gz
  READ_COUNT=$(( $(awk '{print $1/2}' <(wc -l corrected_reads.correctedReads.fasta)) ))
  cat 6.log > 6_racon.log
  echo -n ";100;$READ_COUNT;" >> 6_racon.log && cp 6_racon.log 6_racon_.log

Command exit status:
  1

Command output:
  (empty)

Command error:
      /opt/conda/envs/read_correction-/bin/sqStoreDumpMetaData \
        -S ./corrected_reads.seqStore \
        -corrected \
        -histogram \
      > ./corrected_reads.seqStore/readlengths-obt.txt \
      2> ./corrected_reads.seqStore/readlengths-obt.err 

  -- Finished on Thu Jun 24 09:07:56 2021 (like a bat out of hell) with 1108.475 GB free disk space
  ----------------------------------------
  --
  -- In sequence store './corrected_reads.seqStore':
  --   Found 0 reads.
  --   Found 0 bases (0 times coverage).
  --
  -- Purging correctReads output after loading into stores.
  -- Purged 1 .cns outputs.
  -- Purged 2 .out job log outputs.
  --
  -- Purging overlaps used for correction.
  -- Report changed.
  -- Finished stage 'cor-loadCorrectedReads', reset canuIteration.
  --
  -- Yikes!  No corrected reads generated!
  -- Can't proceed!
  --
  -- Generating empty outputs.
  -- No change in report.
  -- Finished stage 'generateOutputs', reset canuIteration.
  --
  -- Assembly 'corrected_reads' finished in '/home2/adham/NanoCLUST/work/6e/f3e0a8c687663fe25421e3ee7ea11d'.
  --
  -- Summary saved in 'corrected_reads.report'.
  --
  -- Sequences saved:
  --   Contigs       -> 'corrected_reads.contigs.fasta'
  --   Unassembled   -> 'corrected_reads.unassembled.fasta'
  --   Unitigs       -> 'corrected_reads.unitigs.fasta'
  --
  -- Read layouts saved:
  --   Contigs       -> 'corrected_reads.contigs.layout'.
  --   Unitigs       -> 'corrected_reads.unitigs.layout'.
  --
  -- Graphs saved:
  --   Unitigs       -> 'corrected_reads.unitigs.gfa'.
  -- No change in report.
  -- Finished stage 'cor-dumpCorrectedReads', reset canuIteration.
  --
  -- Bye.
  gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory

Work dir:
  /home2/adham/NanoCLUST/work/6e/f3e0a8c687663fe25421e3ee7ea11d

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

.command.sh Report

-- WARNING:
-- WARNING:  Option '-nanopore-raw <files>' is deprecated.
-- WARNING:  Use option '-nanopore <files>' in the future.
-- WARNING:
-- canu 2.1.1
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '10.0.2' (from '/home/morilab/anaconda3/bin/java') without -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 24 CPUs and 47 gigabytes of memory.
-- No grid engine detected, grid and staging disabled.
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl      7.000 GB    4 CPUs x   6 jobs    42.000 GB  24 CPUs  (k-mer counting)
-- Local: hap        7.000 GB    4 CPUs x   6 jobs    42.000 GB  24 CPUs  (read-to-haplotype assignment)
-- Local: cormhap    6.000 GB   12 CPUs x   2 jobs    12.000 GB  24 CPUs  (overlap detection with mhap)
-- Local: obtovl     4.000 GB    8 CPUs x   3 jobs    12.000 GB  24 CPUs  (overlap detection)
-- Local: utgovl     4.000 GB    8 CPUs x   3 jobs    12.000 GB  24 CPUs  (overlap detection)
-- Local: cor        8.000 GB    4 CPUs x   5 jobs    40.000 GB  20 CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x  11 jobs    44.000 GB  11 CPUs  (overlap store bucketizer)
-- Local: ovs        8.000 GB    1 CPU  x   5 jobs    40.000 GB   5 CPUs  (overlap store sorting)
-- Local: red        9.000 GB    4 CPUs x   5 jobs    45.000 GB  20 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x   5 jobs    40.000 GB   5 CPUs  (overlap error adjustment)
-- Local: bat       16.000 GB    4 CPUs x   1 job     16.000 GB   4 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)
--
-- In 'corrected_reads.seqStore', found Nanopore reads:
--   Nanopore:                 1
--
--   Raw:                      1
--
-- Generating assembly 'corrected_reads' in '/home2/adham/NanoCLUST/work/6e/f3e0a8c687663fe25421e3ee7ea11d':
--    - only correct raw reads.
--
-- Parameters:
--
--  genomeSize        1500
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1200 ( 12.00%)
--    utgOvlErrorRate 0.1200 ( 12.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1200 ( 12.00%)
--    utgErrorRate    0.1200 ( 12.00%)
--    cnsErrorRate    0.2000 ( 20.00%)
--
--
-- BEGIN CORRECTION
--
--
-- Creating overlap store correction/corrected_reads.ovlStore using:
--      1 bucket
--      2 slices
--        using at most 1 GB memory each
-- Finished stage 'cor-overlapStoreConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Fri Jun 25 10:15:52 2021 with 1108.477 GB free disk space (1 processes; 11 concurrently)

    cd correction/corrected_reads.ovlStore.BUILDING
    ./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1

-- Finished on Fri Jun 25 10:15:52 2021 (lickety-split) with 1108.477 GB free disk space
----------------------------------------
--
-- Overlap store bucketizer jobs failed, retry.
--   job correction/corrected_reads.ovlStore.BUILDING/bucket0001 FAILED.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Fri Jun 25 10:15:52 2021 with 1108.477 GB free disk space (1 processes; 11 concurrently)

    cd correction/corrected_reads.ovlStore.BUILDING
    ./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1

-- Finished on Fri Jun 25 10:15:52 2021 (like a bat out of hell) with 1108.477 GB free disk space
----------------------------------------
--
-- Overlap store bucketizer jobs failed, tried 2 times, giving up.
--   job correction/corrected_reads.ovlStore.BUILDING/bucket0001 FAILED.
--

ABORT:
ABORT: canu 2.1.1
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory
wc: corrected_reads.correctedReads.fasta: No such file or directory

I do hope that someone can shed some light on this.

I like to thank everyone who reads this in advance. Adham

Edit: I do not know how to fix the styling of the post so I am sorry. Edit 2: I was able to make the post easier to read and I added the command I used.

genomicsITER / NanoCLUST

Error executing process > read_correction #42