bcgsc / LINKS

⛓ Long Interval Nucleotide K-mer Scaffolder
GNU General Public License v3.0
73 stars 15 forks source link

Does LINKs break pre-existing scaffolds? #30

Closed milw closed 5 years ago

milw commented 6 years ago

It's not a problem, but I just wanted clarification- it appears that LINKs does NOT break a supplied sequence with internal NNNs (eg generated contigs from an input scaffold sequence). Can that be confirmed, and would you suggest I try running only unscaffolded contigs, in case my input scaffolds have errors? Thanks! Scott

warrenlr commented 6 years ago

hi Scott, thanks for reaching out & your interest in LINKS.

No, LINKS will not break your assembly -- it will scaffold a supplied draft, as it is input by the user.

For missassembly correction (& breaking your assembly draft), I recommend you have a look at tigmint:

https://github.com/bcgsc/tigmint https://www.biorxiv.org/content/early/2018/04/20/304253

It is a missassembly corrector that uses linked reads, such as those from 10x Genomics. Note that it will not only break at Ns, but any regions with no molecule coverage (as assessed from the barcode info).

As per my recommendation, It really depends : Do you have any reasons to believe that the previous scaffolding was wrong? If so, you could break at Ns and run LINKS.. but again, this is not commonly done unless you have strong reasons to believe your current, scaffolded, assembly is wrong.

Often, users will use multiple scaffolding strategies and building atop each draft, as required. For this, you may wish to run LINKS iteratively and after the last iteration, fill your gaps with gap-closing tools like Sealer or RAILS/cobbler (these tools are also from our group).

Cheers, Rene

milw commented 6 years ago

Hi Rene, thanks for the quick response. My draft was built with Illumina PE and pretty extensive mate pair data (2, 5, and 8kb inserts), and I think it is pretty good actually, although not good enough for NCBI annotation (contig N50 ~ 9kb, scaffolds ~ 35kb). We then generated about 5X Nanopore coverage, so that's what I'm trying to use with LINKs and the existing assembly. I also looked at your Rails and Cobbler, though that seems to call for 'high quality long reads' and I'm skeptical if the Nanopore reads are good enough.