chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
534 stars 87 forks source link

Issue with string graph? #185

Closed larryns closed 3 years ago

larryns commented 3 years ago

In the supplementary material for the "The complete sequence of a human genome", there is a note about a problem with the string graph algorithm. See:

The modification fixed a problem in the pseudocode in Myers 2005 (12) (which has made its way into the miniasm implementation). The pseudocode was missing a final verification that the identified overlap is actually redundant with respect to the other two overlaps based on their sizes. Without this check, an overlap could be deemed transitive even if it implied an incompatible arrangement of the reads.

Does this issue affect hifiasm as well? If so, is there a workaround?

Thanks for any input. Larry.

chhylp123 commented 3 years ago

Sorry I missed your issue. Hifiasm has fixed it.

larryns commented 3 years ago

Thanks for the response. Can you tell me how hifiasm handles the problem of contained reads being discarded?

chhylp123 commented 3 years ago

Sorry I missed your issue again :( When doing assembly, hifiasm first ignores contained reads, and then patches contained reads back to unitigs or conitgs. It is tricky...

larryns commented 3 years ago

Hi, no problem, thanks for the info. Could you elaborate a little more please?

So the contained reads are ignored during the string graph build? But contained reads are used for consensus correction?

When you say the contained reads are patched back to unitigs, can you elaborate on what you mean? Are the contained reads are used for haplotype resolution?

Thank you for taking the time to answer my questions. I'm trying to understand how the algorithm works so I can feed the best data for it. I don't see the documentation of these details written anywhere, so I appreciate you taking the time to explain the details to me.

chhylp123 commented 3 years ago
  1. So the contained reads are ignored during the string graph build? Yes.
  2. When you say the contained reads are patched back to unitigs? We are using the contained reads to fill the gaps between contigs/unitigs after getting the the initial assembly without contained reads.
larryns commented 3 years ago

Ah okay, I think I follow. Thanks for the explanation.