bioforensics / yeat

YEAT: Your Everyday Assembly Tool
Other
1 stars 0 forks source link

Adding support for long read assembly algorithms #35

Closed danejo3 closed 1 week ago

danejo3 commented 1 year ago

There are two common ways to sequence a DNA: short-read and long-read sequencing. As of right now, YEAT only supports short-read sequencing. Adding long-read assembly support will be a tremendous addition to YEAT's capabilities because long-read sequencing has come a long way, is rising in popularity, and can solve many of the downsides that short-read sequencing has.

While both techniques have their pros and cons, adding ability to assembly long reads would be a great way to diversify YEAT's capabilities.

To explore this long-read assembly algorithms, a few algorithms of interest came into my radar: -Flye (all of it's variations: metaFlye and virally in particular) -Trycycler -Canu -Raven -Redbean

If we were to begin support for long-read assembly, I would start with Flye, for doing everything, and Trycycler, for their focus on bacterial genomes.

Flye is interesting because it can practically do anything with long reads. Similar to SPAdes!

"It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies."

Flye also has a bunch of flags, like SPAdes, that adjusts the algorithm to produce better assembly results for metagenomics and plasmids.

Trycycler would be great to algorithm to add for its speciality in bacterial genome assembly. Not quite sure how it compares with Flye's --plasmid since the last publication back in 2021.

danejo3 commented 10 months ago

While Trycycler has not been implemented, will plan to add in the future, Flye, Canu, Hifiasm (#49), and Hifiasm-meta (#49) has been added.

Will look into Raven and Redbean more. It seems like Raven is actively developed while Redbean last release was 2019.

standage commented 10 months ago

Yeah, we evaluated wtbd...wbtgb...wt...ugh, Readbean a few years ago. I don't remember if we ever did a thorough assessment, but the first-pass results weren't super promising IIRC. Since it doesn't appear to be actively developed, I wouldn't make this assembler a high priority for inclusion.

danejo3 commented 1 week ago

YEAT has grown a lot over the past year. Long read assemblers such as flye, canu, unicycler, hifiasm, hifiasm_meta, and metaMDBG were added.

If there are any new long read assembly algorithms to evaluate and add, let's start creating new issue threads.