BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
179 stars 13 forks source link

Candidates fragments of corrected reads as input #31

Closed zhixue closed 3 years ago

zhixue commented 3 years ago

Hi, I am trying to run TGS-GapCloser using error-corrected reads to fill gaps of scaffolds. However, I have found that running the program with "--ne" not only skips error correction (step 3) but also skips TGSCandidate (step 2).

Could you give some suggestion about the step 2 to obtain the corresponding candidate fragments of corrected reads (e.g. minimap2 parameters)? And I noticed that the program uses contigs as query and reads as target in the step4.

I want to filter candidate corrected reads from several individuals in the same group just like "in order to limit the size of data for further analysis" mentioned in TGS-GapCloser article, because too many reads lead to overlapping (mapping) out of memory in step 4.

The details are as follows,

INFO  :   Run TGS-GapCloser from /lustre/home/myaccount/tool/TGS-GapCloser ;
          Version : 1.1.1 ;
          Release time : 2019-12-31 .

INFO  :   Parsing args starting ...
             --scaff /lustre/home/myaccount/ref/chrs.fa
             --reads ../sample/corr.fa
             --output Chrgapclose
             --ne 
             --tgstype ont
             --min_idy 0.3
             --min_match 300
             --thread 40

INFO  :   Parsing args end .

INFO  :   Checking basic args & env ...
              -   No error correction by --ne option
              -   TGS reads type is ont . MINIMAP2_PARAM is  -x ava-ont   MIN_IDY is 0.3 . MIN_MATCH is 300 .

INFO  :   Checking basic args & env end.

INFO  :   Step 1 , run TGSSeqSplit to split scaffolds into scaftigs(contigs). 

INFO  :   Step 1 , done .

INFO  :   Step 2 , skip TGSCandidate by --ne option.

INFO  :   Step 3 , skip error correction by --ne option.

INFO  :   Step 4 , gap filling ... 
              -   Use ../GJ/corr_less60.fa as final TGS READS input.
              -   4,1 , mapping contigs against reads ... 

Thank you.

adonis316 commented 3 years ago

Hi,

  1. Step 2 (TGSCandidate ) was designed to reduce bases of long reads that are required to be corrected to save memory usage in Step 3 (error correction). That is the reason why "ne" option will skip both of them.

  2. In Step4, all the candidates are generated based on the alignment of scaftigs against long reads, and then vote. But according to your log, it seems that the program stopped at mapping. Please take a look at the paf file. It could be a minimap2 issue.

  3. If the memory usage of minimap2 is a problem, then you could manually partition or sample the corrected long reads. The memory correlates with the total bases of input long reads.

Hope it would help.

Mengyang

zhixue commented 3 years ago

Thank you for your quick response, Mengyang.

The memory was not enough when running minimap2 (scaffold size: 400Mb, Corrected reads size: 5Gb, Machine memory: 200G). I think the parameter of "ava-ont" used can get accuracy results as mentioned in https://github.com/BGI-Qingdao/TGS-GapCloser/issues/8 but it needs much more memory than I expected. Finally, I choose "map-ont" instead of "ava-ont" to reduce the memory because of enough sequencing depth in my study.

Thank you again.