Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
350 stars 52 forks source link

hifi reads assembly #141

Closed kibzhulab closed 2 years ago

kibzhulab commented 2 years ago

hello, I ran into this problem when I assembled the genome from hifi data and couldn't solve it,could you give me some advise, my input reads is hifi.fasta

Error message [346137 INFO] 2022-03-19 11:12:05 NextDenovo start... [346137 INFO] 2022-03-19 11:12:06 version:v2.5.0 logfile:pid346137.log.info [346137 WARNING] 2022-03-19 11:12:06 Re-write workdir [346137 WARNING] 2022-03-19 11:12:06 Change task "all" to "assemble", becasue the input_type is "corrected" [346137 INFO] 2022-03-19 11:12:06 skip mkdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir [346137 INFO] 2022-03-19 11:12:06 skip mkdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/01.raw_align [346137 INFO] 2022-03-19 11:12:06 skip mkdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/02.cns_align [346137 INFO] 2022-03-19 11:12:06 skip mkdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/03.ctg_graph [346137 INFO] 2022-03-19 11:12:06 skip step: db_stat [346137 INFO] 2022-03-19 11:12:06 updated options: rerun: 3 deltmp: 1 rewrite: 1 task: assemble job_type: local read_cutoff: 1k read_type: hifi parallel_jobs: 2 seed_depth: 40.0 pa_correction: 2 seed_cutfiles: 3 seed_cutoff: 36191 genome_size: 800000 input_type: corrected blocksize: 5195036747 job_prefix: nextDenovo ctg_cns_options: -sp -p 15 sort_options: -m 2g -t 2 -k 40 nextgraph_options: -a 1 -R 0.7 minimap2_options_map: -x asm20 minimap2_options_raw: -t 8 -x ava-hifi correction_options: -p 15 -max_lq_length 10000 minimap2_options_cns: -t 8 -x ava-hifi --minide 0.1 --maxhan1 1000 -f 800 workdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir input_fofn: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./input.fofn raw_aligndir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/01.raw_align cns_aligndir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/02.cns_align ctg_graphdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/03.ctg_graph [346137 INFO] 2022-03-19 11:12:06 summary of input data: file: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/01.raw_align/input.reads.stat [Read length stat] Types Count (#) Length (bp) N10 11376 27275 N20 25067 24102 N30 40309 21892 N40 56974 20124 N50 75054 18577 N60 94620 17186 N70 115799 15839 N80 138883 14445 N90 164462 12859

Types Count (#) Bases (bp) Depth (X) Raw 194359 3494691165 4368.36 Filtered 0 0 0.00 Clean 194359 3494691165 4368.36

*Suggested seed_cutoff (genome size: 0.80Mb, expected seed depth: 40, real seed depth: 40.00): 36191 bp [346137 INFO] 2022-03-19 11:12:06 skip step: split_seed [346137 INFO] 2022-03-19 11:12:06 skip step: cns_align [346137 INFO] 2022-03-19 11:12:06 skip step: ctg_graph [346137 INFO] 2022-03-19 11:12:11 Total jobs: 3 [346137 INFO] 2022-03-19 11:12:11 Submitted jobID:[346439] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/nextDenovo.sh] in the local_cycle. [346137 INFO] 2022-03-19 11:12:12 Submitted jobID:[346519] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/nextDenovo.sh] in the local_cycle. [346137 INFO] 2022-03-19 11:12:12 Submitted jobID:[346561] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align3/nextDenovo.sh] in the local_cycle. [346137 INFO] 2022-03-19 11:12:13 ctg_align done [346137 INFO] 2022-03-19 11:12:18 Total jobs: 2 [346137 INFO] 2022-03-19 11:12:18 Submitted jobID:[346857] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns1/nextDenovo.sh] in the local_cycle. [346137 INFO] 2022-03-19 11:12:19 Submitted jobID:[346893] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns2/nextDenovo.sh] in the local_cycle. [346137 INFO] 2022-03-19 11:12:20 ctg_cns done [346137 INFO] 2022-03-19 11:12:20 remove temporary result: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/cns0.fasta.sort.bam [346137 INFO] 2022-03-19 11:12:20 remove temporary result: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/cns2.fasta.sort.bam [346137 INFO] 2022-03-19 11:12:20 remove temporary result: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align3/cns1.fasta.sort.bam Traceback (most recent call last): File "/mnt/data/Lixl/software/NextDenovo/nextDenovo", line 850, in main(args) File "/mnt/data/Lixl/software/NextDenovo/nextDenovo", line 821, in main asm, stat = gather_ctg_cns_output(cfg, task.jobs, seq_info) File "/mnt/data/Lixl/software/NextDenovo/nextDenovo", line 293, in gather_ctg_cns_output out = cal_n50_info(stat, asm + '.stat') File "/mnt/data/Lixl/software/NextDenovo/lib/kit.py", line 204, in cal_n50_info out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-') IndexError: list index out of range

Genome characteristics this is a mtgenome and genomesize is around 800k

Input data Total base count,3.3G hifi fasta reads sequencing depth 50x average/N50 read length...`

My configuration file run.cfg is as follows:

[General] job_type = local job_prefix = nextDenovo task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = yes rerun = 3 parallel_jobs = 2 input_type = raw read_type = hifi input_fofn = ./input.fofn workdir = ./01_rundir

[correct_option] read_cutoff = 1k genome_size = 800000 pa_correction = 2 sort_options = -m 2g -t 2 minimap2_options_raw = -t 8 correction_options = -p 15

[assemble_option] minimap2_options_cns = -t 8 nextgraph_options = -a 1

GCC gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)

Python Python 3.9.7

NextDenovo nextDenovo v2.5.0

moold commented 2 years ago

It seems nextDenovo with default options can not assemble some contigs for your input data, so change some nextgraph parameters or try other assembly tools.