chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
530 stars 87 forks source link

haps are smaller than expected #165

Open m-jahani opened 3 years ago

m-jahani commented 3 years ago

Hi, I assemble a diploid plant genome with default HiC mode in HIFIasm (0.15.5-r350). The genome size is expected to be 811M (based on flow cytometry). The results look good, but I would like to push the quality as much as I can.

Here is the result that I got:


Information for *asm.hic.hap1.p_ctg.gfa

total contigs length: 789715320 as % of genome: 96.54 % N50 5443612 BUSCO: C:96.9%[S:94.1%,D:2.8%],F:0.3%,M:2.8%,n:2326


information for *asm.hic.hap2.p_ctg.gfa total contigs length: 776385689 as % of genome: 94.91 % N50 4514560 BUSCO: C:97.6%[S:95.1%,D:2.5%],F:0.3%,M:2.1%,n:2326


information for *asm.hic.p_ctg.gfa total contigs length: 844829462 as % of genome: 103.28 % N50 12490608 BUSCO: C:98.0%[S:91.9%,D:6.1%],F:0.3%,M:1.7%,n:2326

log file:

hifiasm.log


Is it possible to improve my assembly size with tweaking settings? hap1 and hap2 are 789715320 and 776385689, respectively. But the expected genome size is 811000000.

hap1 and hap2 have different sizes, is there any way for balancing haps?

Thanks

chhylp123 commented 3 years ago

May I ask what's the size of the *hic.p_ctg.gfa*? Does this sample have sex chromosomes?

m-jahani commented 3 years ago

The size of *asm.hic.p_ctg.gfa is 844829462. Yes, it does have sex chromosome. The target genome is a female plant with XX sex chromosomes.

chhylp123 commented 3 years ago

I personally think your assemblies are already pretty good. It is very hard to make two haplotypes have equal size due to centromeric regions. As for the smaller size, I have no idea if hifiasm really misses some regions or two haplotypes should be such small. Could you please get the Hi-C heatmap or perform contig-to-contig alignment between two haplotypes? Both of these two solutions may tell you if hifiasm miss some regions (although I don't think hifiasm will lose 20Mb contigs for each haplotype).

m-jahani commented 3 years ago

Thanks for your reply. I will try Hi-C heatmap and/or contig-to-contig alignment.

Another Question. When I decrease the -s parameter to 48, haps sizes are much closer (balance), and BUSCO results are better too:


Information for *asm.hic.hap1.p_ctg.gfa with -s48 total contigs length: 780788341 BUSCO: C:96.4%[S:93.8%,D:2.6%],F:0.4%,M:3.2%,n:2326


information for *asm.hic.hap2.p_ctg.gfa with -s48 total contigs length: 781757043 BUSCO: 97.8%[S:95.2%,D:2.6%],F:0.3%,M:1.9%,n:2326


Do you recommend using -s48? Would not that change other aspects of assembly quality?

Thanks

chhylp123 commented 3 years ago

Are you using -s0.48 or -s48?

m-jahani commented 3 years ago

My bad, I meant --hom-cov 48. Would any of -S or --hom-cov work in my case?

chhylp123 commented 3 years ago

Yean, --hom-cov should be set to hom peak. You can try different values for -s to see if the results are improved. Hifiasm is pretty fast when bin file has been generated.