bichangwei / PMAT

An efficient assembly tool for plant mitochondrial genome
28 stars 7 forks source link

Error: An error occurred during the trim process? #25

Open goshng opened 2 months ago

goshng commented 2 months ago

Hi,

Thank you for the great tool. I want to use an ONT long-read to assemble the mtDNA of Spirodela polyrhiza. I have the following error. How can I change the option of PMAT to pass the error message.

Thank you,

SangChul

The error message is:

nextDenovo for correction and assembly end.

2024-10-02 14:26:56
Reads trim start ...
2024-10-02 14:26:59
-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
--
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
--
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '11.0.1' (from '/home/user/miniconda3/envs/polap/bin/java') without -d64 support.
-- Detected gnuplot version '5.4 patchlevel 5   ' (from 'gnuplot') and image format 'png'.
--
-- Detected 56 CPUs and 252 gigabytes of memory on the local machine.
--
-- Local machine mode enabled; grid support not detected or not allowed.
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl     24.000 GB    8 CPUs x   7 jobs   168.000 GB  56 CPUs  (k-mer counting)
-- Local: hap       12.000 GB   14 CPUs x   4 jobs    48.000 GB  56 CPUs  (read-to-haplotype assignment)
-- Local: cormhap   13.000 GB   14 CPUs x   4 jobs    52.000 GB  56 CPUs  (overlap detection with mhap)
-- Local: obtovl     8.000 GB    8 CPUs x   7 jobs    56.000 GB  56 CPUs  (overlap detection)
-- Local: utgovl     8.000 GB    8 CPUs x   7 jobs    56.000 GB  56 CPUs  (overlap detection)
-- Local: cor        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x  56 jobs   224.000 GB  56 CPUs  (overlap store bucketizer)
-- Local: ovs        8.000 GB    1 CPU  x  31 jobs   248.000 GB  31 CPUs  (overlap store sorting)
-- Local: red       16.000 GB    4 CPUs x  14 jobs   224.000 GB  56 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x  31 jobs   248.000 GB  31 CPUs  (overlap error adjustment)
-- Local: bat       64.000 GB    8 CPUs x   1 job     64.000 GB   8 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    8 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)
--
-- Found untrimmed corrected Nanopore reads in the input files.
--
-- Generating assembly 'PMAT' in '/mnt/user/figshare/Spirodela_polyrhiza/pmat/trim_out':
--   genomeSize:
--     108000000
--
--   Overlap Generation Limits:
--     corOvlErrorRate 0.3200 ( 32.00%)
--     obtOvlErrorRate 0.1200 ( 12.00%)
--     utgOvlErrorRate 0.1200 ( 12.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.3000 ( 30.00%)
--     obtErrorRate    0.1200 ( 12.00%)
--     utgErrorRate    0.1200 ( 12.00%)
--     cnsErrorRate    0.2000 ( 20.00%)
--
--   Stages to run:
--     only trim corrected reads.
--
--
-- Correction skipped; not enabled.
--
-- BEGIN TRIMMING
----------------------------------------
-- Starting command on Wed Oct  2 14:26:59 2024 with 1149.632 GB free disk space

cd .
./PMAT.seqStore.sh \
> ./PMAT.seqStore.err 2>&1

-- Finished on Wed Oct  2 14:27:10 2024 (11 seconds) with 1149.385 GB free disk space
----------------------------------------
--
-- In sequence store './PMAT.seqStore':
--   Found 21627 reads.
--   Found 1041628981 bases (9.64 times coverage).
--    Histogram of corrected reads:
--
--    G=1041628981                       sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010       123334       723    104283600  ||       5088-10312        1261|------------------------------
--    00020       101284      1661    208333343  ||      10313-15537        2653|---------------------------------------------------------------
--    00030        88289      2764    312561303  ||      15538-20762        1857|---------------------------------------------
--    00040        77585      4023    416657398  ||      20763-25987        1507|------------------------------------
--    00050        67796      5460    520857488  ||      25988-31212        1270|-------------------------------
--    00060        58896      7106    625026660  ||      31213-36437        1213|-----------------------------
--    00070        49421      9030    729187468  ||      36438-41662        1140|----------------------------
--    00080        38581     11395    833322338  ||      41663-46887        1111|---------------------------
--    00090        24609     14717    937473047  ||      46888-52112        1132|---------------------------
--    00100         5088     21626   1041628981  ||      52113-57337        1079|--------------------------
--    001.000x               21627   1041628981  ||      57338-62562         990|------------------------
--                                               ||      62563-67787         951|-----------------------
--                                               ||      67788-73012         816|--------------------
--                                               ||      73013-78237         706|-----------------
--                                               ||      78238-83462         649|----------------
--                                               ||      83463-88687         568|--------------
--                                               ||      88688-93912         483|------------
--                                               ||      93913-99137         419|----------
--                                               ||      99138-104362        334|--------
--                                               ||     104363-109587        254|-------
--                                               ||     109588-114812        231|------
--                                               ||     114813-120037        183|-----
--                                               ||     120038-125262        138|----
--                                               ||     125263-130487        152|----
--                                               ||     130488-135712        124|---
--                                               ||     135713-140937         81|--
--                                               ||     140938-146162         86|---
--                                               ||     146163-151387         53|--
--                                               ||     151388-156612         36|-
--                                               ||     156613-161837         38|-
--                                               ||     161838-167062         27|-
--                                               ||     167063-172287         19|-
--                                               ||     172288-177512         20|-
--                                               ||     177513-182737         12|-
--                                               ||     182738-187962          7|-
--                                               ||     187963-193187         10|-
--                                               ||     193188-198412          5|-
--                                               ||     198413-203637          3|-
--                                               ||     203638-208862          2|-
--                                               ||     208863-214087          2|-
--                                               ||     214088-219312          0|
--                                               ||     219313-224537          1|-
--                                               ||     224538-229762          1|-
--                                               ||     229763-234987          1|-
--                                               ||     234988-240212          1|-
--                                               ||     240213-245437          0|
--                                               ||     245438-250662          0|
--                                               ||     250663-255887          0|
--                                               ||     255888-261112          0|
--                                               ||     261113-266337          1|-
--
--
-- ERROR:  Read coverage (9.64) lower than allowed.
-- ERROR:    minInputCoverage  = 10
-- ERROR:
-- ERROR:  This could be caused by an incorrect genomeSize.
-- ERROR:
-- ERROR:  You can force Canu to continue by decreasing parameter
-- ERROR:  minInputCoverage.  Be warned that the quality of corrected
-- ERROR:  reads and/or contiguity of contigs will be poor.
--

ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

[ WARNING 2024-10-02 14:27:11 ] An error occurred during the trim process?

The script is:

PMAT autoMito -i SRR11472010.fastq -o pmat -st ont -g 108m -cs nextDenovo \
    -np /home/user/bin/nextDenovo \
    -cp /home/user/bin/canu \
    -cfg nextdenovo.cfg -m

nextdenovo.cfg is:

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = correct # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes 
parallel_jobs = 20 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = ont # clr, ont, hifi
input_fofn = input.fofn
workdir = pmat

[correct_option]
read_cutoff = 1k
genome_size = 108m 
sort_options = -m 20g -t 15
minimap2_options_raw = -t 8
pa_correction = 3
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8 
nextgraph_options = -a 1
yunmika commented 1 month ago

Thank you very much for using PMAT. This error is caused by the sequencing data depth being lower than 10X. You can use the error-corrected data as the input data for PMAT and set -st hifi to skip the error correction step to avoid this error.