chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
526 stars 86 forks source link

Output interpretation with HiFi+ONT+HiC with inbred samples + `-l0` #627

Open AndreaGuarracino opened 6 months ago

AndreaGuarracino commented 6 months ago

How should I interpret hifiasm outputs when I have HiFi+ONT+HiC of an inbred sample as input, and I use -l0?

With HiFi and HiFi+ONT, -l0 gives a healthy primary assembly, and a shorter and fragmented alternate assembly. This is expected, having a homozygous inbred sample.

With HiFi+ONT+HiC, -l0 gives a primary assembly that is a bit worse than the primary assembly made with HiFi+ONT, no alternate assembly, and two haplotype-resolved assemblies (*.hic.hap[1|2].p_ctg.*). One of these haplotype-resolved assemblies looks better than the primary assemblies made with HiFi+ONT and HiFi+ONT+HiC, the other is shorter and worse than those.

baozg commented 6 months ago

If you use HiC for phasing, the -l0 shouldn't work anymore (not purge dups or phasing). It should be similar to -l3 with purge haplotypes. In my experience, the HiC phasing for inbreeding samples would generate two similar haplotypes except for the long heterozygous regions. For the contiguity, another explanation may be hifiasm try to put all unresolved short contigs into haplotype1 https://github.com/chhylp123/hifiasm/issues/623