chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
533 stars 87 forks source link

An assembly error #110

Open zhangrengang opened 3 years ago

zhangrengang commented 3 years ago

graph-1 This is a real example in unitig graph (*.hic.p_utg.gfa) that seems to be an error: the overlap between A and B is only 408 bp, but the paths A-B and A-C-B are resolved in *.hic.hap1.p_ctg.gfa and *.hic.hap2.p_ctg.gfa, respectively. Nevertheless, it is ok in *.hic.p_ctg.gfa (A-C-B). I used default parameters with HiC mode.

chhylp123 commented 3 years ago

Thanks a lot. I guess the reason is that hifiasm thinks it is a bubble but it is actually not...

chhylp123 commented 3 years ago

Are A, B and C homozygous unitigs or not?

zhangrengang commented 3 years ago

No, they are not homozygous, as there are some heterozygous small bubbles in A, B and C (see the graph). It seems that -x and -y did not work. I think the short overlap should be re-checked by reads that span it.

chhylp123 commented 3 years ago

If A and B are at both haplotypes, does it mean hifiasm generate false duplication for A and B?

zhangrengang commented 3 years ago

It is false deletion for C in one of the haplotypes. Assume the two haplotypes is A1 and A2 for A, B1 and B2 for B, and C1 and C2 for C, it should be resolved into A1-C1-B1 and A2-C2-B2 by Hi-C, but in fact hifiasm output A1-C1-B1 and A2-B2 because there is a short overlap (408 bp) between A and B, which misleads hifiasm to think the two haplotypes are C1 and complete deletion for C locus. The short overlap is actually false. It is confirmed by coverage depth and alignment with reference. Why do -x and -y not drop such short overlap?