BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
183 stars 13 forks source link

run time too long #33

Closed zhoudreames closed 3 years ago

zhoudreames commented 3 years ago

for my data ,150Gb HiFi data and 2.6Gb assembly data,but i dont get final result when running the TGS-GapCloser and using 64 thread for more than 7 days . i find the step that mapping contig into reads ,is very slow ,and using only 2-3 thread though i set up the thread 64 and my machine having free. i dont know how long i need to spend on time to runing the produre,so i want to asking you what's way to speed up my run? thanks~

zhoudreames commented 3 years ago

@cchd0001 @adonis316 @NTNguyen13

zhoudreames commented 3 years ago

this my running code TGS-GapCloser.sh --scaff chr4_rmMisAsm.fa --reads chr4_HiFi.fa --minmap_arg '-x asm20' --tgstype pb --ne --output gap_close --thread 64

adonis316 commented 3 years ago

Hi, TGS-GapCloser was designed for the gap closing with low sequencing depth (~10x, up to 20x) of long reads. The large HiFi dataset will definitely increase the running time for mapping, although minimap2 is one of the fastest mappers.

We have tested the long-read depth effect on the final gap-closing result for a human sample, and it turns out that the number of closed gaps saturates at about 20x.

Typically, it takes about 1~2 hours to close gaps for the human genome (~3Gb) with 10x long reads without error correction using 64 threads .

I would suggest to sample your HiFi data to about 10-15x. If you worry about the genome coverage of the sampling, then you could try to split the data first and close gaps with each proportion one by one.

Hope it would help.

Thanks, Mengyang