chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
547 stars 87 forks source link

output of partial phasing - some unitig can be missing in one haploid? #670

Closed yfukasawa closed 4 months ago

yfukasawa commented 4 months ago

Hello,

Thank you very much for maintaining and developing this great software!

I understand that recent versions of hifiasm output partially phased assembly by default when only hifi reads are given. I'm using ver. 0.19.5-r587.

Regarding a diploid genome I'm working on, one chromosome had a problem with some missing unitigs. hap2 has a fairly contiguous contig for the chromosome, almost T2T, however, hap1 contigs of the same chromosome seem to be missing some unitigs. About 7 Mbp is missing in hap1, but no such problem for hap2.

Gap sequences in hap1 are mapped back to p_utg*, and many of them seem to homozygotic unitigs. Some examples are below. Colored unitigs are missing in hap1, relatively high support read counts (from rd:i: tag) like >40 in 60x dataset, and forked edges at both ends, looking like homozygotic. missing_unitigs_graph

Is there an option/version to try for this case? Or is this an expected result for partial phasing? Since I am not observing this issue for the other chromosomes, this behavior would be specific to this chromosome.

chhylp123 commented 4 months ago

Hi @yfukasawa I feel like in this case, hifiasm may incorrectly think that colored unitigs as heterozygous unitigs, instead of homozygous unitigs. Probably you can have a try to make the homozygous coverage threshold a little bit smaller like 45 or 50. Then hifiasm may think these two contigs are homozygous.

yfukasawa commented 4 months ago

Hello @chhylp123

Thank you for your reply and advice. I will certainly give it a try with some smaller coverage thresholds.

YF

yfukasawa commented 4 months ago

Hello @chhylp123

Smaller homozygotic coverage works fine for those regions. Thanks!