Closed ucassee closed 4 years ago
Hi @moold , Thanks for your reply. Each node in my cluster have 28 threads and 256G RAM. I am afraid of IO burden, so I just assign one subtask to one node. I wonder how many subtasks I can run in each node for getting optimum performance. I don't want to sacrifice accuracy for acceleration.
4 or more.., you should test.
Generally speaking, one subtask with 28 threads is slower than 4 tasks with 7 threads each.
In my test, it took 10h/30h to finish a subtask using 28/7 threads.
Due to this step is really slow, I want to know the function of this step. Could you please briefly introduce it? PS: The input file (cns.fasta) of this step is about 1.5G and the result file (cns.filt.dovt.ovl) is about 150M Thanks in advance!
Try to change --kn 17 to --kn 18. This step is used to find precise overlaps between corrected seeds. For highly repetitive genomes, especially those with high AT or GC, it will be a bottleneck.
I have finished 100 subtasks of all 406 subtasks. If I change the parameter from -k 17
to-k 18
. Should I rerun the 100 finished subtasks? I am not sure whether the genome is highly repetitive.
No, just change the config file, and rerun the main task. But, it is better to have a test firstly.
What is the point of this test? For time consuming, accurate or others?
time
Hi @moold ,
The bigger -k
reduced running time,but the output file (cns.filt.dovt.ovl) was much smaller (the sizes of -k17/-k18/k19 are 130M/90M/71M ). Will the different size of output files influence the next step or the final assemble?
Ps: For the genome assembled by Wtdbg2, it had 69.69% of repeat sequence.
Hope to your reply.
should adjust --kn 17 to --kn 18, 19...
, not -k
.
Dear developer, I use nextdenovo to assemble a 1.9G genome sequencd in 130X. There are 406 subtasks in minimap2-nd step. But it takes 8 hours to finish each subtask. I wonder how can I accelerate this step? Thanks in advance.
This is my run.cfg file
This is a log file of a finished subtask.