chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
529 stars 86 forks source link

Inquiry Regarding Genome Assembly with ONT Data: Quality Comparisons and Recommendations #579

Open Bank-tidy opened 9 months ago

Bank-tidy commented 9 months ago

Dear Developer,

I hope this message finds you well. First and foremost, I would like to extend my sincere gratitude for developing this remarkable software. It has been an invaluable tool in my research. However, I have encountered some queries that I hope you might help me clarify.

In my recent endeavors, I utilized all available Oxford Nanopore Technologies (ONT) data and also specifically extracted ONT data greater than 100kb, combining them with HiFi data for assembly. The resulting assemblies were then aligned to a reference genome and named all_ont_genome and 100k_ont_genome, respectively.

Upon analysis, I observed that the all_ont_genome had a contigs N50 of 60 Mb with 8 gaps, whereas the 100k_ont_genome exhibited a contigs N50 of 67 Mb but with only 3 gaps. This observation leads me to infer that the quality of the 100k_ont_genome might be superior to the all_ont_genome. This is somewhat puzzling to me, as the all_ont_genome also includes data over 100kb, yet its performance appears inferior. image

Could you kindly provide some insights into this observation? Additionally, what would be your recommended approach for assembly in such a scenario?

Thank you for your time and assistance. I eagerly await your response.

Best regards!

chhylp123 commented 9 months ago

Shorter ONT reads may be noisy, which may confuse hifiasm in some cases. It is tricky as there are only a few gaps for both assemblies.

Bank-tidy commented 9 months ago

@chhylp123 Thank you for your suggestion. Judging from the results, do you recommend using UL ONT data larger than 100k for better assembly results?

chhylp123 commented 9 months ago

100kb_ont_genome looks better. But it would be better to double check if it has some assembly errors.