Thanks for developing this pipeline.
I'm currently using this tool to close the gaps in my scaffolds which are generated after Bionano hybrid scaffolding. I used my pre-assembled contigs (produced by MaSuRCA) to close the gaps without error correction. I have some questions about the gap-filltered results:
There are some negative gaps in my scaffolds (13bp length in my case). I checked the alignment in Bionano access and found overlaps between the flanking contigs. For this kind of gaps, instead of merging the overlapped sequence, TGS-GapCloser inserted a fragment here (Like a 23Kb overlap in scaffold, a 20K fragment from long read inserted). Below the example, 13bp gap from 3520056 to 3520068. There are quite a few this kind of situations.
>Super-Scaffold_1000201 3208497 S 1 32084973208498 3518242 S 3210311 35200553518243 3538539 F3538540 3884641 S 3520069 38661703884642 4113877 F4113878 4199241 S 4091907 4177270
I also found that Bionano estimated a 123Kb gap. TGS-GapCloser closed 66Kb of it. However, I didn't find any "N" left here. (below the example)
>Super-Scaffold_2671 147545 S 1 147545147546 213699 F213700 423197 S 270648 480145
The genome size increased by 22Mb, while the gap size estimated by Bionano is just ~6Mb. TGS-GapCloser indeed close all of the gaps. No "N" left.
Are the observations above normal? How could these happen?
Look forward to your reply and thank you very much in advance.
Explanation of the differences between "Bionano estimated gaps" and "TGS-GapCloser filled gaps" :
TGS-GapCloser uses "Input Long Reads" to close gaps in "Input Scaffolds". It defaults the assembly information provided by "Long Reads".
In your project, TGS-GapCloser applies assembly information from ”pre-assembled contigs (produced by MaSuRCA)“ to close gaps in " Bionano hybrid scaffolds", but uses no gap size information from the input scaffolds. It depends on which assembly information you trust more, Bionano or long read?
2. Our default application scenario is using error-prone TGS reads as "Input Reads". Thus, the default parameters might not be suitable for your high-quality assembled contigs. I would suggest that you try to increase thresholds such as --min_match and --min_idy values.
If a reference assembly is available, you can assess the final result with the reference and compare with the input assembly. If not, try BUSCO.
Best wishes,
Lidong
Hello TGS-GapCloser team,
Thanks for developing this pipeline. I'm currently using this tool to close the gaps in my scaffolds which are generated after Bionano hybrid scaffolding. I used my pre-assembled contigs (produced by MaSuRCA) to close the gaps without error correction. I have some questions about the gap-filltered results:
>Super-Scaffold_100020
1 3208497 S 1 3208497
3208498 3518242 S 3210311 3520055
3518243 3538539 F
3538540 3884641 S 3520069 3866170
3884642 4113877 F
4113878 4199241 S 4091907 4177270
>Super-Scaffold_267
1 147545 S 1 147545
147546 213699 F
213700 423197 S 270648 480145
Are the observations above normal? How could these happen? Look forward to your reply and thank you very much in advance.
Kind regards, Chengcheng