chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
517 stars 85 forks source link

Different Hifiasm versions yield large differences in haplotype assembly sizes #443

Open gdaviduu opened 1 year ago

gdaviduu commented 1 year ago

Dear Haoyu,

I have run both one of the newest versions (0.19.3-r572) and the last fully stable version (0.16.1-r375) of Hifiasm with HiC-phasing, using default parameters. I have assembled a female bird genome with estimate size of 1.35 Gb, the heterogametic sex with the Z chromsome estimated to be ~75Mb in size and the W at ~20Mb in size. If no haplotype-switiching occurred on the sex chromsome, I may then perhaps expect to see that one haplotype assembly is ~75 to ~100 Mb longer than the other assembly.

Using Hifiasm version 0.16.1-r375, I obtained a difference of ~4.9 Mb between the two haplotype assemblies: 1,281,348,692 (hap1) and 1,276,444,075 (hap2) Using Hifiasm version 0.19.3-r572, the difference was ~160 Mb. 1,364,205,692 (hap1) and 1,204,002,576 (hap2)

Which version of Hifiasm might you recommend to use, given these results? The latter assembly difference is much larger than expected, given the Merqury plot below which seems to suggest no additional purging with purge_dups is needed: image

I would truly appreciate your kind help!

Thank you, Gabriel

chhylp123 commented 1 year ago

I'm not 100% sure as the HiC phasing might be not as stable as the trio-binning. But I guess for both assemblies, most contigs have been assigned to the right haplotype, while a few contigs might be misassigned. So it would be better that you can do scaffolding and mannually move some misassigned contigs to the right haplotype. The scaffolding should be able to fix the remaining issues.