Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
352 stars 52 forks source link

jobs in step 02.cns_align/02.cns_align.sh.work runs extremely slow #23

Closed zijiangyang closed 4 years ago

zijiangyang commented 4 years ago

Hi:

I'm using nextdenovo to assemble a highly repetitive plant genome. It turns out that jobs in step 02.cns_align/02.cns_align.sh.work runs extremely slow. It takes about 1 hours for finishing one single job and I got 9000 jobs in total. Here is the setting I'm using minimap2-nd -I 6G --step 2 --dual=yes -x ava-pb -t 32 -k17 -w17, and the memory allocated for each job is 120 Gb. The target and query fasta file is about 500 Mb. Here is the log file: `[M::mm_idx_gen::11.8761.35] collected minimizers [M::mm_idx_gen::13.3421.78] sorted minimizers [M::main::13.3421.78] loaded/built the index for 23552 target sequence(s) [M::mm_mapopt_update::13.4731.78] mid_occ = 2280 [M::mm_idx_stat] kmer size: 17; skip: 17; is_hpc: 1; #seq: 23552 [M::mm_idx_stat::13.5611.77] distinct minimizers: 5972457 (46.92% are singletons); average occurrences: 6.770; average spacing: 12.505 [M::worker_pipeline::3513.41728.06] mapped 23340 sequences [M::worker_pipeline::3822.019*25.92] mapped 210 sequences [M::main] Version: 2.17-r941 [M::main] CMD: /nextomics/NextDenovo2.1/NextDenovo/bin/minimap2-nd -I 6G --step 2 --dual=yes -x ava-pb -t 32 -k17 -w17 -o cns.filt.dovt.ovl /nextdenovotest/02.cns_align/01.get_cns.sh.work/get_cns002/cns.fasta /nextdenovotest/02.cns_align/01.get_cns.sh.work/get_cns027/cns.fasta [M::main] Real time: 3822.115 sec; CPU: 99073.192 sec; Peak RSS: 23.554 GB

real 63m42.143s user 1648m22.115s sys 2m51.097s`

Is there ways to speedy up this step?

Thanks in advance!

moold commented 4 years ago

Hi, you can set minimap2_options_cns = -x ava-ont -t 32 -k17 -w17 --norealign to disable the re-align step, but be careful for the assembly result (may include more assembly errors) because insensitive mapping of dovetail overlaps with the default version of minimap2.