Open zhangrengang opened 3 years ago
Thanks a lot. I guess the reason is that hifiasm thinks it is a bubble but it is actually not...
Are A, B and C homozygous unitigs or not?
No, they are not homozygous, as there are some heterozygous small bubbles in A, B and C (see the graph). It seems that -x
and -y
did not work. I think the short overlap should be re-checked by reads that span it.
If A and B are at both haplotypes, does it mean hifiasm generate false duplication for A and B?
It is false deletion for C in one of the haplotypes. Assume the two haplotypes is A1 and A2 for A, B1 and B2 for B, and C1 and C2 for C, it should be resolved into A1-C1-B1 and A2-C2-B2 by Hi-C, but in fact hifiasm output A1-C1-B1 and A2-B2 because there is a short overlap (408 bp) between A and B, which misleads hifiasm to think the two haplotypes are C1 and complete deletion for C locus. The short overlap is actually false. It is confirmed by coverage depth and alignment with reference. Why do -x
and -y
not drop such short overlap?
This is a real example in unitig graph (
*.hic.p_utg.gfa
) that seems to be an error: the overlap between A and B is only 408 bp, but the paths A-B and A-C-B are resolved in*.hic.hap1.p_ctg.gfa
and*.hic.hap2.p_ctg.gfa
, respectively. Nevertheless, it is ok in*.hic.p_ctg.gfa
(A-C-B). I used default parameters with HiC mode.