some samples terminate at read_correction due to gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory

Hello,

I have a recurring error for some of my samples which i cant resolve.

It looks like the majority of clusters run fine, but some have issues creating corrected_reads.correctedReads.fasta.gz

Ran with:

// UMAP Clustering and polishing parameters
  umap_set_size = 50000
  cluster_sel_epsilon = 0.5
  min_cluster_size = 200
  polishing_reads = 100
  min_read_length = 1400
  max_read_length = 1700
  avg_amplicon_size = "1.5k"

I also tried different min_cluster_size and umap_set_size, without success.

Should i prefilter reads for length > 1400 bp before feeding them into NanoClust or lowering polishing_reads parameter?

Any suggestions are welcome!

Thank you, Michel

nextflow output:

N E X T F L O W  ~  version 21.10.6
Launching `main.nf` [distracted_monod] - revision: 4dd7ecf4ba

----------------------------------------------------
      _   __                     ________    __  _____________
     / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
    /  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /   
   / /|  / /_/ / / / / /_/ /  / /___/ /___/ /_/ /___/ // /    
  /_/ |_/\__,_/_/ /_/\____/   \____/_____/\____//____//_/     

  NanoCLUST v1.0dev
----------------------------------------------------

Run Name          : distracted_monod
Reads             : /home/momi/DATA/SUPSI/ACME_project/raw_reads/ACME_barcode02_*.fq
Max Resources     : 8 GB memory, 8 cpus, 10h time per job
Output dir        : ./acme1
Launch dir        : /home/momi/tools/NanoCLUST
Working dir       : /home/momi/tools/NanoCLUST/work
Script dir        : /home/momi/tools/NanoCLUST
User              : momi
Config Profile    : test,conda
Config Description: Minimal test dataset to check pipeline function
----------------------------------------------------
executor >  local (93)
[a3/3d2ea2] process > QC (1)                   [100%] 1 of 1 ✔
[9b/a3d1c4] process > fastqc (1)               [100%] 1 of 1 ✔
[ba/81f8de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[fc/7371cd] process > read_clustering (1)      [100%] 1 of 1 ✔
[66/b6ec39] process > split_by_cluster (1)     [100%] 1 of 1 ✔
[b6/21028d] process > read_correction (88)     [ 78%] 85 of 108
[-        ] process > draft_selection          [  0%] 0 of 85
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[36/dc755f] process > output_documentation     [100%] 1 of 1 ✔
Error executing process > 'read_correction (86)'

Caused by:
  Process `read_correction (86)` terminated with an error exit status (1)

Command executed:

  head -n$(( 20*4 )) 79.fastq > subset.fastq
  canu -correct -p corrected_reads -nanopore-raw subset.fastq genomeSize=1.5k stopOnLowCoverage=1 minInputCoverage=2 minReadLength=500 minOverlapLength=200
executor >  local (93)
[a3/3d2ea2] process > QC (1)                   [100%] 1 of 1 ✔
[9b/a3d1c4] process > fastqc (1)               [100%] 1 of 1 ✔
[ba/81f8de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[fc/7371cd] process > read_clustering (1)      [100%] 1 of 1 ✔
[66/b6ec39] process > split_by_cluster (1)     [100%] 1 of 1 ✔
[3a/be78d2] process > read_correction (86)     [ 79%] 86 of 108, failed: 1
[-        ] process > draft_selection          [  0%] 0 of 85
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[36/dc755f] process > output_documentation     [100%] 1 of 1 ✔
Error executing process > 'read_correction (86)'

Caused by:
  Process `read_correction (86)` terminated with an error exit status (1)

Command executed:

  head -n$(( 20*4 )) 79.fastq > subset.fastq
  canu -correct -p corrected_reads -nanopore-raw subset.fastq genomeSize=1.5k stopOnLowCoverage=1 minInputCoverage=2 minReadLength=500 minOverlapLength=200
  gunzip corrected_reads.correctedReads.fasta.gz
  READ_COUNT=$(( $(awk '{print $1/2}' <(wc -l corrected_reads.correctedReads.fasta)) ))
  cat 79.log > 79_racon.log
  echo -n ";20;$READ_COUNT;" >> 79_racon.log && cp 79_racon.log 79_racon_.log

Command exit status:
  1

Command output:
  (empty)

Command error:
  -- Finished on Fri Mar 10 11:19:42 2023 (fast as lightning) with 162.857 GB free disk space
  ----------------------------------------
  -- Found 1 read correction output files.
  -- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
  -- Found 1 read correction output files.
  -- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
  --
  -- Loading corrected reads into corStore and seqStore.
  ----------------------------------------
  -- Starting command on Fri Mar 10 11:19:42 2023 with 162.857 GB free disk space

      cd correction
      /home/momi/tools/NanoCLUST/work/conda/read_correction--eb02b0003ddd5b6992456fbfde1a4cc9/bin/loadCorrectedReads \
        -S ../corrected_reads.seqStore \
        -C ./corrected_reads.corStore \
        -L ./2-correction/corjob.files \
      >  ./corrected_reads.loadCorrectedReads.log \
      2> ./corrected_reads.loadCorrectedReads.err

  -- Finished on Fri Mar 10 11:19:42 2023 (furiously fast) with 162.857 GB free disk space
  ----------------------------------------
  --
  -- No corrected reads generated; correctReads output saved.
  --
  -- Purging overlaps used for correction.
  -- Finished stage 'cor-loadCorrectedReads', reset canuIteration.
  ----------------------------------------
  -- Starting command on Fri Mar 10 11:19:42 2023 with 162.857 GB free disk space

      cd .
      /home/momi/tools/NanoCLUST/work/conda/read_correction--eb02b0003ddd5b6992456fbfde1a4cc9/bin/sqStoreDumpFASTQ \
        -corrected \
        -S ./corrected_reads.seqStore \
        -o ./corrected_reads.correctedReads.gz \
        -fasta \
        -nolibname \
      > corrected_reads.correctedReads.fasta.err 2>&1

  -- Finished on Fri Mar 10 11:19:42 2023 (like a bat out of hell) with 162.857 GB free disk space
  ----------------------------------------
  --
  -- Corrected reads saved in 'corrected_reads.correctedReads.fasta.gz'.
  -- Finished stage 'cor-dumpCorrectedReads', reset canuIteration.
  --
  -- Trimming skipped; not enabled.
  --
  -- Unitigging skipped; not enabled.
  --
  -- Bye.
  gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory

Work dir:
  /home/momi/tools/NanoCLUST/work/3a/be78d2cbcc6fb83b297f55353e9ab6

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

.command.err attached

genomicsITER / NanoCLUST

some samples terminate at read_correction due to gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory #78