chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
547 stars 87 forks source link

larger assembly size than kmer estimation genome size #621

Closed leon945945 closed 8 months ago

leon945945 commented 8 months ago

Hi, I estimated the genome size with HiFi data, the estimated genome size is 328Mb with 1.02% hetorozygosity: plot

I assembled the primary genome and phased genome with HiC data by hifiasm. The size of primary genome is 409Mb and two phased haplotype are 389Mb and 366Mb with default hifiasm -s 0.55. They are larger than the estimation genome size.

Then I adjusted the parameter to -s 0.3, the primary genome size decreased to 396Mb, two phased haplotype size decreased to 377Mb and 356Mb. They are still larger than the estimation size.

Could you please give me some suggestions on how to adjust the assembly size. Thanks.

chhylp123 commented 8 months ago

@leon945945 Sorry for the late reply. The estimated genome size from k-mers might be smaller than the real genome size, since they may underestimate repetitive regions. In addition, it would be better to discard too short contigs. Removing these useless small contigs may make both haplotypes smaller.