bcgsc / arcs

🌈Scaffold genome sequence assemblies using linked or long read sequencing data
GNU General Public License v3.0
91 stars 16 forks source link

`arcs-long`/`arks-long` vs `LINKS`? #162

Closed taprs closed 1 year ago

taprs commented 1 year ago

Dear BCGSC team,

I am curious about the performance of the two long-read-based scaffolding approaches found among your repos. Has anyone benchmarked arcs/arks-based solutions together with LINKS which besides being a part of arcs is also a long-read-based scaffolding tool? Does the former work better having these "split" long reads?

More specifically, I am interested in the better method and parametrization to use with PacBio HiFi reads. I am hoping to get a HiFi assembly (yes, made from the same reads) of a 400Mbp genome from ~50 scaffolds to 13 chromosomal superscaffolds — is this even possible with your tools? The HiFi coverage is some ~100x.

Best, Nikita

warrenlr commented 1 year ago

Hi Nikita, thank you for your question and interest in our tools.

Has anyone benchmarked arcs/arks-based solutions together with LINKS which besides being a part of arcs is also a long-read-based scaffolding tool?

We have not done side-by-side comparisons/benchmarks between ARCS-long and LINKS. Because the two tools work differently, you may be able to use both in your pipeline, and I'd recommend running LINKS first followed by ARCS-long since the latter typically works best on more contiguous assemblies (based on our experience, but you may wish to experiment). I'd also recommend you take a look at longStitch https://github.com/bcgsc/longstitch, a multi-step pipeline to correct and scaffold your assemblies. longStitch uses yet another assembly approach, (minimizer-based ntLink https://github.com/bcgsc/ntlink, in addition to ARCS) and may also be used in conjunction with LINKS.

I recommend that you experiment with all of the tools mentioned above, and parameterize for your data (the default params might not be optimum to work with your PacBio HiFi since they were developed using ONT data), hopefully reaching chromosome-scale using a de novo approach.

If a closely related reference exists, I'd recommend taking a look at ntJoin (https://github.com/bcgsc/ntjoin).

Good luck! Rene

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. Thank you for your interest in ARCS!