genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
103 stars 47 forks source link

Error executing process > read_correction #42

Open adhamzul opened 3 years ago

adhamzul commented 3 years ago

Hello!

I have been trying out NanoCLUST for 16S analysis of a few of my data and it has been wonderful so far. But, a few datasets always abort during read_correction.

Even though I have 47 GB of RAM and a 24 CPU machine, I thought maybe there were too many data so I played around with --umap_set_size and --polishing_reads but the same Error executing process > read_correction error comes up with an error exit status (1).

I'll post the whole error along with the information I got from running .command.sh in the workdir. I don't think it is anything wrong with Canu itself as I was able to complete the pipeline on most of my datas.

The command that I used:

nextflow run main.nf -profile docker --reads 'part3/10000_lq8BC03.fastq' --db "db/16S_ribosomal_RNA" --tax "db/taxdb/" --outdir "results/part3" --umap_set_size 25000 --polishing_reads 50

NanoCLUST

Error executing process > 'read_correction (7)'

Caused by:
  Process `read_correction (7)` terminated with an error exit status (1)

Command executed:

  head -n$(( 100*4 )) 6.fastq > subset.fastq
  canu -correct -p corrected_reads -nanopore-raw subset.fastq genomeSize=1.5k stopOnLowCoverage=1 minInputCoverage=2 minReadLength=500 minOverlapLength=200
  gunzip corrected_reads.correctedReads.fasta.gz
  READ_COUNT=$(( $(awk '{print $1/2}' <(wc -l corrected_reads.correctedReads.fasta)) ))
  cat 6.log > 6_racon.log
  echo -n ";100;$READ_COUNT;" >> 6_racon.log && cp 6_racon.log 6_racon_.log

Command exit status:
  1

Command output:
  (empty)

Command error:
      /opt/conda/envs/read_correction-/bin/sqStoreDumpMetaData \
        -S ./corrected_reads.seqStore \
        -corrected \
        -histogram \
      > ./corrected_reads.seqStore/readlengths-obt.txt \
      2> ./corrected_reads.seqStore/readlengths-obt.err 

  -- Finished on Thu Jun 24 09:07:56 2021 (like a bat out of hell) with 1108.475 GB free disk space
  ----------------------------------------
  --
  -- In sequence store './corrected_reads.seqStore':
  --   Found 0 reads.
  --   Found 0 bases (0 times coverage).
  --
  -- Purging correctReads output after loading into stores.
  -- Purged 1 .cns outputs.
  -- Purged 2 .out job log outputs.
  --
  -- Purging overlaps used for correction.
  -- Report changed.
  -- Finished stage 'cor-loadCorrectedReads', reset canuIteration.
  --
  -- Yikes!  No corrected reads generated!
  -- Can't proceed!
  --
  -- Generating empty outputs.
  -- No change in report.
  -- Finished stage 'generateOutputs', reset canuIteration.
  --
  -- Assembly 'corrected_reads' finished in '/home2/adham/NanoCLUST/work/6e/f3e0a8c687663fe25421e3ee7ea11d'.
  --
  -- Summary saved in 'corrected_reads.report'.
  --
  -- Sequences saved:
  --   Contigs       -> 'corrected_reads.contigs.fasta'
  --   Unassembled   -> 'corrected_reads.unassembled.fasta'
  --   Unitigs       -> 'corrected_reads.unitigs.fasta'
  --
  -- Read layouts saved:
  --   Contigs       -> 'corrected_reads.contigs.layout'.
  --   Unitigs       -> 'corrected_reads.unitigs.layout'.
  --
  -- Graphs saved:
  --   Unitigs       -> 'corrected_reads.unitigs.gfa'.
  -- No change in report.
  -- Finished stage 'cor-dumpCorrectedReads', reset canuIteration.
  --
  -- Bye.
  gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory

Work dir:
  /home2/adham/NanoCLUST/work/6e/f3e0a8c687663fe25421e3ee7ea11d

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

.command.sh Report

-- WARNING:
-- WARNING:  Option '-nanopore-raw <files>' is deprecated.
-- WARNING:  Use option '-nanopore <files>' in the future.
-- WARNING:
-- canu 2.1.1
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '10.0.2' (from '/home/morilab/anaconda3/bin/java') without -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 24 CPUs and 47 gigabytes of memory.
-- No grid engine detected, grid and staging disabled.
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl      7.000 GB    4 CPUs x   6 jobs    42.000 GB  24 CPUs  (k-mer counting)
-- Local: hap        7.000 GB    4 CPUs x   6 jobs    42.000 GB  24 CPUs  (read-to-haplotype assignment)
-- Local: cormhap    6.000 GB   12 CPUs x   2 jobs    12.000 GB  24 CPUs  (overlap detection with mhap)
-- Local: obtovl     4.000 GB    8 CPUs x   3 jobs    12.000 GB  24 CPUs  (overlap detection)
-- Local: utgovl     4.000 GB    8 CPUs x   3 jobs    12.000 GB  24 CPUs  (overlap detection)
-- Local: cor        8.000 GB    4 CPUs x   5 jobs    40.000 GB  20 CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x  11 jobs    44.000 GB  11 CPUs  (overlap store bucketizer)
-- Local: ovs        8.000 GB    1 CPU  x   5 jobs    40.000 GB   5 CPUs  (overlap store sorting)
-- Local: red        9.000 GB    4 CPUs x   5 jobs    45.000 GB  20 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x   5 jobs    40.000 GB   5 CPUs  (overlap error adjustment)
-- Local: bat       16.000 GB    4 CPUs x   1 job     16.000 GB   4 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)
--
-- In 'corrected_reads.seqStore', found Nanopore reads:
--   Nanopore:                 1
--
--   Raw:                      1
--
-- Generating assembly 'corrected_reads' in '/home2/adham/NanoCLUST/work/6e/f3e0a8c687663fe25421e3ee7ea11d':
--    - only correct raw reads.
--
-- Parameters:
--
--  genomeSize        1500
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1200 ( 12.00%)
--    utgOvlErrorRate 0.1200 ( 12.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1200 ( 12.00%)
--    utgErrorRate    0.1200 ( 12.00%)
--    cnsErrorRate    0.2000 ( 20.00%)
--
--
-- BEGIN CORRECTION
--
--
-- Creating overlap store correction/corrected_reads.ovlStore using:
--      1 bucket
--      2 slices
--        using at most 1 GB memory each
-- Finished stage 'cor-overlapStoreConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Fri Jun 25 10:15:52 2021 with 1108.477 GB free disk space (1 processes; 11 concurrently)

    cd correction/corrected_reads.ovlStore.BUILDING
    ./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1

-- Finished on Fri Jun 25 10:15:52 2021 (lickety-split) with 1108.477 GB free disk space
----------------------------------------
--
-- Overlap store bucketizer jobs failed, retry.
--   job correction/corrected_reads.ovlStore.BUILDING/bucket0001 FAILED.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Fri Jun 25 10:15:52 2021 with 1108.477 GB free disk space (1 processes; 11 concurrently)

    cd correction/corrected_reads.ovlStore.BUILDING
    ./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1

-- Finished on Fri Jun 25 10:15:52 2021 (like a bat out of hell) with 1108.477 GB free disk space
----------------------------------------
--
-- Overlap store bucketizer jobs failed, tried 2 times, giving up.
--   job correction/corrected_reads.ovlStore.BUILDING/bucket0001 FAILED.
--

ABORT:
ABORT: canu 2.1.1
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory
wc: corrected_reads.correctedReads.fasta: No such file or directory

I do hope that someone can shed some light on this.

I like to thank everyone who reads this in advance. Adham

Edit: I do not know how to fix the styling of the post so I am sorry. Edit 2: I was able to make the post easier to read and I added the command I used.

Congnguyenn commented 1 year ago

Hi, my data got the same error with you and I figured out that due to the reads in the error fastq file are too short (shorter than the default parameters), they were removed. You can adjust these parameters and run it again