chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
528 stars 86 forks source link

Question of abnoamal large HiFi-only assemble output in heterozygous insect. #261

Open DO-T opened 2 years ago

DO-T commented 2 years ago

Hi! We try to assemble a diploid insect by HiFi-only mode in hifiasm(0.15) but the assembly result size both in hap1/2 are triple of we evaluation by kmer survey in illumian short reads.

We first adjusted the "-s" option in scale 0.4,0.3,0.35,0.2,0.1, however, the outputs are still remain in at least twifold of our evaluation.

And then we noticed the "--hom-cov" option so we reruned the assembly with "-s" option in scale 0.55,0.35,0.1 and " --hom-cov 52" which is our kmer distrubution result used hifi-reads. Unfortunately, no matter how can I adjust these options, the final results were still two/three-fold of our expectation.

This insect was 600Mb size and the heterozygosity about 4.2% in kmer survey by illumian short reads.

In outputs, we select the -s 0.55,0.3,0.1 output try to explaining why hap1 and hap2 are two/three-fold large. We used Mummer to find the synteny of itself with its relatives. The hap1 result has partial 2 contigs map to the one homeologous chromosome of its relative, and the hap2 has the same pattern, too.

How can I clean these redundant contigs or any options in hifiasm recommended? Waiting for your reply!

chhylp123 commented 2 years ago

Could you please have a try with the latest version (0.16.1)? Version 0.15 might have some issues for partial phased assemblies if the heterozygosity rate is extremely high.

DO-T commented 2 years ago

Sorry for the late reply. I had used hifiasm(0.16) with options -s in scale 0.55,0.3,0.1 and --hom-cov 52 to assembly this insect genome in these days, yet the results were still upon our evaluation at least once.

chhylp123 commented 2 years ago

I see. Then you should have a try with purge_dups: https://github.com/dfguan/purge_dups.