Potential misassembly? - Githubissues

Hi MattHuff, Thank you for letting us know this problem. TGS-GapCloser was initially designed for the gap closing using a low coverage depth of long reads, in which the sensitivity (how many gaps can be closed) was the first priority. However, we noticed more and more researchers are using a high coverage as the price of long reads is getting lower. Thus, more attention needs to be paid to the precision (how many closed gaps are true sequences of targe genome).

There is a possibility that a gap can be wrongly closed due to scaffolding misassembly or false alignment relationship. It depends on the polyploidy, genome size, proportion of repeats, and heterozygosity of targe genome, as well as base-calling accuracy and read length of PacBio or Oxford nanopore long reads. To avoid this situation, we added three parameters:

_--min_nread (minimum number of reads that can bridge this gap. 1 by default)
--max_nread (maximum number of reads that can bridge this gap. -1 by default)
--maxcandidate (maximum number of candidates used for error correction and gap filling. 10 by default)

Too few or too many long reads that support the same connection would cause type 1 errors. In our test, increasing --min_nread or decreasing --max_nread will increase positive predictive value to some extent, at the expense of the sensitivity. You can determine the values based on your sequencing coverage.

Thanks, Mengyang

BGI-Qingdao / TGS-GapCloser

Potential misassembly? #44