BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
172 stars 12 forks source link

why use the reads as target #50

Closed tinyfallen closed 2 years ago

tinyfallen commented 2 years ago

Hi dear developer,

Thanks for your great works! I have tried to close a few gaps in my plant assembly, however the pipeline always terminated at step 2.1 as the minimap2 produced an enormous core.***(seem to be a random number suffix) file and threw out a core dumped as issue42 described. As the server loading is full now, I could only delay the test of the method mentioned above. Looking into the scripts I am confused about that in the mapping step why use the reads as target ? The contigs always have much longer length and fewer number than the reads used for assembly in order of magnitudes, which may caused the issue.

Looking forward to your reply! BEST~

cchd0001 commented 2 years ago

Hi, Thank you for using TGS-GapCloser. We have noticed this problem several times.

Why do we use reads as the target :

  1. TGS-GapCloser was first designed to fill gaps in scaffolds assembled by NGS reads. At that time, contigs were not always longer than TGS reads.
  2. Using reads as the target instead of scaffolds increases the number of final filled gaps based on our tests.

Why does the program crash:

For huge genomes like some plants with high-coverage long reads, this strategy will let minimap2 consume a huge memory cost and may cause a crash if it exceeds the maximum available memory size.

For some computing systems, the crashed program saves the screenshot into a core file as you described. Please delete it.

Please check issue42 to find the solution.

Thanks, Lidong

tinyfallen commented 2 years ago

Hi,

Thanks for your explanation! I will try to split reads and assembly to do the trick.

Best!

adonis316 commented 2 years ago

Hi, Thanks again for your suggestion! We do consider updating an option that allows users to choose scaffolds as the target to save memory.

Thanks, Mengyang

tinyfallen commented 2 years ago

Hi dear developer,

I am trying to map all the corrected reads to the assembly first and then extract reads which map to specific pseudochromosome to feed gapcloser one-by-one as a reads splitting method. Do you think it is workable?

Thanks~

adonis316 commented 2 years ago

Seems great! But please keep mapping reads as many as possible since multi-alignments may decrease the number of filled gaps. Specifically, you can output N>=5 best alignments and use some reads (repetitive region) more than once.

Mengyang

tinyfallen commented 2 years ago

Ok, I am running to test the effect! Thank you~

tinyfallen commented 2 years ago

Hi dear,

Finally I extracted the ont reads mapped to the region including 100kb up and down the gaps and the gaps number reduced from 13 to 5. I think it is the limitation of the data.

By the way, I met another issue during closing gaps in another genome and I will talk about it in a new chart.