Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
360 stars 53 forks source link

minimap2-nd: sketch.c:84: mm_sketch: Assertion `len > 0 && (w > 0 && w < 256) && (k > 0 && k <= 28)' failed. #85

Closed Johnsonzcode closed 4 years ago

Johnsonzcode commented 4 years ago

Hi Dr. Hu, I have problem while using NextDenovo and I would appreciate it so much if you could help me: I am running the lasteast version of NextDenovo with the PacBio HiFi data(human, all reads file size: 167G, about 28X). and get an info:

(base) [poultrylab1@pbsnode01 HG002]$ cat /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/HG002/asm/02.cns_align/02.cns_align.sh.work/cns_align00/nextDenovo.sh.e | tail -n 20
[WARNING] the length database sequence 'SRR8859679.sra.189102' is 0
[WARNING] the length database sequence '28372' is 0
[WARNING] the length database sequence 'SRR8859679.sra.191404' is 0
[WARNING] the length database sequence '28378' is 0
[WARNING] the length database sequence 'SRR8859679.sra.192378' is 0
[WARNING] the length database sequence '28384' is 0
[WARNING] the length database sequence 'SRR8859679.sra.193693' is 0
[M::mm_idx_gen::131.230*0.29] collected minimizers
[M::mm_idx_gen::131.231*0.29] sorted minimizers
[M::main::131.231*0.29] loaded/built the index for 9462 target sequence(s)
[M::mm_mapopt_update::131.231*0.29] mid_occ = 1
[M::mm_idx_stat] kmer size: 17; skip: 17; is_hpc: 1; #seq: 9462
[M::mm_idx_stat::131.231*0.29] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan
minimap2-nd: sketch.c:84: mm_sketch: Assertion `len > 0 && (w > 0 && w < 256) && (k > 0 && k <= 28)' failed.
minimap2-nd: sketch.c:84: mm_sketch: Assertion `len > 0 && (w > 0 && w < 256) && (k > 0 && k <= 28)' failed.
minimap2-nd: sketch.c:84: mm_sketch: Assertion `len > 0 && (w > 0 && w < 256) && (k > 0 && k <= 28)' failed.
minimap2-nd: sketch.c:84: mm_sketch: Assertion `len > 0 && (w > 0 && w < 256) && (k > 0 && k <= 28)' failed.
minimap2-nd: sketch.c:84: mm_sketch: Assertion `len > 0 && (w > 0 && w < 256) && (k > 0 && k <= 28)' failed.
minimap2-nd: sketch.c:84: mm_sketch: Assertion `len > 0 && (w > 0 && w < 256) && (k > 0 && k <= 28)' failed.
/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/HG002/asm/02.cns_align/02.cns_align.sh.work/cns_align00/nextDenovo.sh: line 5: 199536 Aborted                 (core dumped) /storage-01/poultrylab1/yin/software/NextDenovo/bin/minimap2-nd -I 6G --step 2 -x ava-pb -t 6 -k17 -w17 /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/HG002/asm/02.cns_align/01.split_seed.sh.work/split_seed0/cns0.fasta /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/HG002/asm/02.cns_align/01.split_seed.sh.work/split_seed0/cns0.fasta -o cns.filt.dovt.ovl

My run.cfg:

[General]
job_type = local
job_prefix = nextDenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 10
input_type = corrected
input_fofn = /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/HG002/input.fofn
workdir = /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/HG002/asm/
#usetempdir = /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/HG002/tmp/

[correct_option]
read_cutoff = 1k
seed_cutoff = 10000
seed_cutfiles = 10       # split seed reads into ${seed_cutfiles} subfiles. (default: ${pa_correction})
blocksize = 10g
pa_correction = 8       # number of corrected tasks used to run in parallel, overwrite parallel_jobs only for this step. (default: 15)
minimap2_options_raw = -x ava-pb -t 6
sort_options = -m 50g -t 6 -k 40
correction_options = -p 6

[assemble_option]
random_round = 20
minimap2_options_cns = -x ava-pb -t 6 -k17 -w17
nextgraph_options = -a 1

Seq_stat gives me a seed_cutoff of 0 bp, I try both 0 bp ,10000bp, and default value 29999. They all give me: [WARNING] the length database sequence 'SRR8859679.sra.193693' is 0 and minimap2-nd: sketch.c:84: mm_sketch: Assertion `len > 0 && (w > 0 && w < 256) && (k > 0 && k <= 28)' failed.

Am I use the wrong option in run.cfg? If i am, please let me know the right option. Thanks a lot in advance !!!

moold commented 4 years ago

The current version of NextDenovo is not suitable for assembly with PacBio Hifi reads, becasue Minimap2 does not optimize for Hifi reads overlapping, so you can try other tools.

Johnsonzcode commented 4 years ago

Thank you for your quick reply ! Is there any tweak to enable analysis for PacBio Hifi reads ? We are eager to try to assemble with NextDenovo Becasue of the less time comsuming and less resource occupation of NextDenovo.

moold commented 4 years ago

For HIFI data, most assemblers does not require much resource, so just try to use Canu or Hifiasm. BTW, it is hard to tweak options for user to assemble Hifi data using NextDenovo, some algorithms may need to be rewritten and optimized, I plan to release a version supports Hifi assembly in the future.

Johnsonzcode commented 4 years ago

Thank you again, waiting for versions for Hifi data. And appreciate it so much for your excellent work on NextDenovo. It is the pride of us Chinese !!!

moold commented 4 years ago

Thank you for your feedback.