BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
179 stars 13 forks source link

Rework flow of --ne option #34

Closed NTNguyen13 closed 3 years ago

NTNguyen13 commented 3 years ago

The old workflow uses --ne right after step 1, that means the whole read file will go into gap filling process, hence it can result in absurdly large files. I tried it with a corrected 30X ONT file, it generated a 1.9T paf file, which is unnecessary for both computational time and resource.

I propose that we should perform TGSCandidate even when option --ne is used, so that the final file has just enough information.

cchd0001 commented 3 years ago

Hi ,

I am not sure about this.

Before this patch, you generated 1.9T paf file by minimap2 in line 590 of TGSGapCloser.sh.
After this patch, you will generate 1.9T paf file by minimap2 in line 367 of TGSGapCloser.sh

After all, this huge paf file will be generated because both TGSCandidate and TGSGapCloser rely on the paf file as input.

To avoid this huge paf file, you could try to

  1. reduce the amount of input TGS reads.
  2. use more tight minimap2 parameters.

Best wishes Lidong Guo

NTNguyen13 commented 3 years ago

I see, I found that if I correct the reads before input it to TGS-GapCloser, I should use different minimap2 args so that the file can be smaller. Thanks!