Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
350 stars 52 forks source link

too slow in minimap2-nd step #65

Closed ucassee closed 4 years ago

ucassee commented 4 years ago

Dear developer, I use nextdenovo to assemble a 1.9G genome sequencd in 130X. There are 406 subtasks in minimap2-nd step. But it takes 8 hours to finish each subtask. I wonder how can I accelerate this step? Thanks in advance.

This is my run.cfg file

[General] job_type = pbs job_prefix = nextDenovo task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = yes rerun = 5 parallel_jobs = 10 input_type = raw input_fofn = ./input.fofn workdir = ./01_rundir cluster_options = -l nodes=1:ppn=28 -q cu [correct_option] read_cutoff = 2k seed_cutoff = 20731 blocksize = 3g pa_correction = 28 seed_cutfiles = 20 sort_options = -m 150g -t 28 -k 60 minimap2_options_raw = -x ava-pb -t 28 correction_options = -p 28 [assemble_option] random_round = 20 minimap2_options_cns = -x ava-pb -t 28 -k17 -w17 nextgraph_options = -a 1

This is a log file of a finished subtask.

-bash: module: line 1: syntax error: unexpected end of file -bash: error importing function definition for `BASH_FUNC_module' hostname cd /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align0/cns_align008 /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align0/cns_align008 time /data/genometest/software/NextDenovo/bin/minimap2-nd -I 6G --step 2 --dual=yes -x ava-pb -t 28 -k17 -w17 /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/01.get_cns.sh.work/get_cns00/cns.fasta /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/01.get_cns.sh.work/get_cns08/cns.fasta -o cns.filt.dovt.ovl; /data/genometest/software/NextDenovo/bin/minimap2-nd -I 6G --step 2 --dual=yes -x ava-pb -t 28 -k17 -w17 /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/01.get_cns.sh.work/get_cns00/cns.fasta /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/01.get_cns.sh.work/get_cns08/cns.fasta -o cns.filt.dovt.ovl [M::mm_idx_gen::26.2481.51] collected minimizers [M::mm_idx_gen::31.6872.95] sorted minimizers [M::main::31.6882.95] loaded/built the index for 50392 target sequence(s) [M::mm_mapopt_update::32.0832.92] mid_occ = 2277 [M::mm_idx_stat] kmer size: 17; skip: 17; is_hpc: 1; #seq: 50392 [M::mm_idx_stat::32.1772.92] distinct minimizers: 11912490 (27.00% are singletons); average occurrences: 11.913; average spacing: 11.060 [M::worker_pipeline::9649.48426.02] mapped 16972 sequences [M::worker_pipeline::19004.45926.19] mapped 15593 sequences [M::worker_pipeline::27882.99726.54] mapped 15639 sequences [M::worker_pipeline::29204.146*26.31] mapped 2096 sequences [M::main] Version: 2.17-r941 [M::main] CMD: /data/genometest/software/NextDenovo/bin/minimap2-nd -I 6G --step 2 --dual=yes -x ava-pb -t 28 -k17 -w17 -o cns.filt.dovt.ovl /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/01.get_cns.sh.work/get_cns00/cns.fasta /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/01.get_cns.sh.work/get_cns08/cns.fasta [M::main] Real time: 29204.842 sec; CPU: 768326.777 sec; Peak RSS: 30.396 GB real 486m45.365s user 12791m9.724s sys 14m17.454s touch /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align0/cns_align008/nextDenovo.sh.done touch /data/genometest/Project/3.fish/01.assmbly/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align0/cns_align008/nextDenovo.sh.done

moold commented 4 years ago
  1. Generally speaking, one subtask with 28 threads is slower than 4 tasks with 7 threads each.
  2. see here and change --mode to 1 and increase --kn --cn --wn will speed up minimap2-nd, but the accuracy of assembly will be unexpected so be careful for your result.
ucassee commented 4 years ago

Hi @moold , Thanks for your reply. Each node in my cluster have 28 threads and 256G RAM. I am afraid of IO burden, so I just assign one subtask to one node. I wonder how many subtasks I can run in each node for getting optimum performance. I don't want to sacrifice accuracy for acceleration.

moold commented 4 years ago

4 or more.., you should test.

ucassee commented 4 years ago

Generally speaking, one subtask with 28 threads is slower than 4 tasks with 7 threads each.

In my test, it took 10h/30h to finish a subtask using 28/7 threads.

ucassee commented 4 years ago

Due to this step is really slow, I want to know the function of this step. Could you please briefly introduce it? PS: The input file (cns.fasta) of this step is about 1.5G and the result file (cns.filt.dovt.ovl) is about 150M Thanks in advance!

moold commented 4 years ago

Try to change --kn 17 to --kn 18. This step is used to find precise overlaps between corrected seeds. For highly repetitive genomes, especially those with high AT or GC, it will be a bottleneck.

ucassee commented 4 years ago

I have finished 100 subtasks of all 406 subtasks. If I change the parameter from -k 17 to-k 18. Should I rerun the 100 finished subtasks? I am not sure whether the genome is highly repetitive.

moold commented 4 years ago

No, just change the config file, and rerun the main task. But, it is better to have a test firstly.

ucassee commented 4 years ago

What is the point of this test? For time consuming, accurate or others?

moold commented 4 years ago

time

ucassee commented 4 years ago

Hi @moold , The bigger -k reduced running time,but the output file (cns.filt.dovt.ovl) was much smaller (the sizes of -k17/-k18/k19 are 130M/90M/71M ). Will the different size of output files influence the next step or the final assemble? Ps: For the genome assembled by Wtdbg2, it had 69.69% of repeat sequence. Hope to your reply.

moold commented 4 years ago

should adjust --kn 17 to --kn 18, 19..., not -k.