BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
179 stars 13 forks source link

Genome size bigger than before gap-filling. #26

Closed sunnycqcn closed 3 years ago

sunnycqcn commented 3 years ago

Hello, I use to gap-close one genome with TGS-GapCloser. The number of gaps are significantly less than that of the raw genome. However, the genome size is much bigger than the size of the raw genome. Is it normal result using TGS-GapCloser? GapClose `Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage


All                  7,712           9,835   1,666,014,877   1,657,036,327    99.46%
500                  7,712           9,835   1,666,014,877   1,657,036,327    99.46%

1 KB 7,705 9,828 1,666,008,414 1,657,029,864 99.46% 2.5 KB 5,939 8,019 1,663,376,520 1,654,400,604 99.46% 5 KB 4,629 6,610 1,658,621,254 1,649,657,417 99.46% 10 KB 2,867 4,674 1,646,096,827 1,637,199,899 99.46% 25 KB 884 2,510 1,615,394,176 1,606,646,097 99.46% 50 KB 206 1,774 1,592,727,076 1,584,199,527 99.46% 100 KB 79 1,624 1,584,384,037 1,576,477,169 99.50% 250 KB 46 1,563 1,579,034,363 1,574,349,697 99.70% 500 KB 24 1,521 1,571,844,103 1,571,231,678 99.96% 1 MB 23 1,520 1,571,086,103 1,570,473,681 99.96% 2.5 MB 21 1,513 1,568,982,573 1,568,372,581 99.96% 5 MB 20 1,487 1,564,877,851 1,564,277,262 99.96% 10 MB 20 1,487 1,564,877,851 1,564,277,262 99.96% 25 MB 20 1,487 1,564,877,851 1,564,277,262 99.96% 50 MB 20 1,487 1,564,877,851 1,564,277,262 99.96% 100 MB 2 161 218,275,156 218,210,172 99.97% Gapclose before Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage


All                  7,712          20,944   1,604,753,024   1,584,388,815    98.73%
500                  7,712          20,944   1,604,753,024   1,584,388,815    98.73%

1 KB 7,705 20,937 1,604,746,561 1,584,382,352 98.73% 2.5 KB 5,928 19,113 1,602,092,343 1,581,732,551 98.73% 5 KB 4,585 17,645 1,597,225,848 1,576,891,117 98.73% 10 KB 2,699 15,447 1,583,889,373 1,563,694,895 98.73% 25 KB 788 13,051 1,554,806,194 1,534,989,349 98.73% 50 KB 212 12,227 1,535,749,372 1,516,488,034 98.75% 100 KB 103 12,007 1,528,706,138 1,510,536,752 98.81% 250 KB 55 11,903 1,520,799,370 1,508,079,340 99.16% 500 KB 26 11,831 1,511,345,256 1,505,122,331 99.59% 1 MB 23 11,803 1,509,129,532 1,503,426,440 99.62% 2.5 MB 21 11,799 1,506,694,194 1,502,330,625 99.71% 5 MB 20 11,767 1,502,621,455 1,498,274,552 99.71% 10 MB 20 11,767 1,502,621,455 1,498,274,552 99.71% 25 MB 20 11,767 1,502,621,455 1,498,274,552 99.71% 50 MB 20 11,767 1,502,621,455 1,498,274,552 99.71% 100 MB 1 918 109,167,025 108,816,785 99.68% `

adonis316 commented 3 years ago

Hi, The gap-closing result looks OK to me. The contig number is reduced from 20944 to 9835, and thus the contig length is improved. Although the total genome size is increased, the 4% difference is reasonable considering that the estimated N length in the raw genome is usually not accurate enough. But if you worry about the problem of assembly redundance, please increase --min_idy and --min_match. It will increase the gap-closing thresholds, but reduce the number of closed gaps.

Thanks, Mengyang

sunnycqcn commented 3 years ago

Hello Mengyang, Thanks. You are an expert on this. I think you are correct. I will go ahead. Thanks, Fuyou