KolmogorovLab / hapdup

Pipeline to convert a haploid assembly into diploid
Other
89 stars 9 forks source link

noob questions dual vs phased assembly #45

Open JWDebler opened 1 month ago

JWDebler commented 1 month ago

Hi, I'm new to diploid assemblies and have a few questions about what the different output files represent.

I have a flye assembly with 233 contigs. After hapdup i have two dual assemblies, one with 128 and one with 135 contigs as well as two phased assemblies, one with 711 and one with 722 contigs.

From what I've read in the phased assemblies each contig represents a haplotype block that could be resolved to one haplotype (and the blocks that are the same between the two haplotypes).

What I don't really understand is what the dual assemblies represent, especially since the README mentions "Dual assembly will have the same contiguity as the original input, but may contain phase switches. ". In my case the number of contigs almost halved for the two dual assemblies. From that README sentence I thought I would get assemblies that are the same as my input, but the haplotype blocks are connected by the blocks that are the same between the two haplotypes, and that's how they should have the same number of contigs. With the caveat that the blocks before and after a shared block might actually come from different haplotypes (haplotype switch).

Cheers, Johannes

mikolmogorov commented 1 month ago

Here is a good explanation of a dual assembly: https://lh3.github.io/2021/10/10/introducing-dual-assembly. And it references the hifiasm paper that have more explanations / figures.

Misha