BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
172 stars 12 forks source link

insufficient memory when running mnimap2 #72

Closed XiaoxingFan closed 5 months ago

XiaoxingFan commented 11 months ago

Hello! To close the gap in an assembled genome (300Mb) with Pacbio HiFi long reads (60Gb), I tried to run TGS-GapCloser using the following script: tgsgapcloser --scaff WH.fa --reads WH153_Hifi.fa --output WH153 --racon /data21/wzh/fanxx/software/TGSGapcloser/bin/racon --minmap_arg '-x asm20' --tgstype pb >pipe.log 2>pipe.err And then I encountered an error "insufficient memory" shown in minimap2.01.log. So my first question coming. Approximately how much running memory does the task require?
To reduce memory pressure,I extracted the gaps and their flanking 50kb sequences from genome respectively as input for the --scaff. Would this step affect the quality of the filled gaps? I would greatly appreciate it if you were willing to answer my questions!

adonis316 commented 7 months ago

This is a known issue for the current form of TGS-GapCloser that has been reported frequently. This huge memory consumption comes from the large data size of input long reads. As it was originally designed for low depths, the algorithm cannot handle deep depths for long-read alignment.

One of the compromise solutions is extracting gap regions to reduce memory as you mentioned. It could affect the gap filling as the expected global alignment becomes local, thus introducing false positives.

We are trying to fix this memory issue. You can try TGS-GapCloser2 (https://github.com/BGI-Qingdao/TGS-GapCloser2). The usage is the same as that of TGSGapCloser, and can dramatically reduce the memory. But note that it has not been fully tested.

Thanks, Mengyang