bcgsc / goldrush

Linear-time de novo Long Read Assembler
GNU General Public License v3.0
35 stars 2 forks source link

Goldrush for PacBio or ONT + PacBio #116

Closed 000generic closed 1 year ago

000generic commented 1 year ago

Hi!

I was wondering if Goldrush has been tested / is used / is recommended for PacBio data - or ONT + PacBio data? If so, are there any specific recommendations? I have data sets for two species with around human-sized genomes. One with 40x ONT + 7x PacBio and another with 10x ONT + 60x PacBio - all data generated around 2020.

Thank you :) Eric

jwcodee commented 1 year ago

Thanks for your interest and sorry for the delayed response.

GoldRush has been tested on just PacBio Hi-Fi human data. GoldRush assembles PacBio Hi-Fi human data well in our experiments (high base quality and single digit MB NGA50) but the genome assemblies do not reach the level of contiguity of ONT genome assemblies due to the lower read length, which affects the scaffolding algorithm used.

I'm not sure if you have Hi-Fi or CLR but If you were to use GoldRush on PacBio Hi-Fi data, you will have to change the m parameter to the library sized used in generation of the PacBio Hi-Fi data.

Right now, GoldRush does not support hybrid data out of the box. It is possible to manually generate the golden path with PacBio data and scaffold with ONT though we have not explored how much coverage is needed for optimal scaffolding. We are looking to add the option for out of the box hybrid genome assembly in a future update.

60X and 40X should be sufficient coverage to run the GoldRush pipeline.

Regards, Johnathan

000generic commented 1 year ago

Thank you for all the details! Very helpful

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your interest in GoldRush!