chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
498 stars 84 forks source link

Hi-C mode kat spectra-cn validation fails #98

Open chklopp opened 3 years ago

chklopp commented 3 years ago

I've assembled a 1.3 Gb genome with hifiasm (version 0.14 e6e6dbf), merged both haplotypes and compared the kmer content to the read kmers with kat spectra-cn. Because of a small heterozygocity of 0.5% I was expecting to see the major gaussian (homozygous kmers, centered at 25X) in purple and the smaller gaussian to the left (heterozygous kmers, centered at 12X) in red. This in not what the graph shows : http://genoweb.toulouse.inra.fr/~klopp/tmp/hifiasm_0_14_Mrh_both_hap_new-main.mx.spectra-cn.png

Their are unexpected 3X and 4X kmer sets attesting the some parts of the genome are over represented. The central red curve is also showing that a large set of homozygous kmers are missing a copy in the merged assembly.

lh3 commented 3 years ago

merged both haplotypes

Don't merge p_ctg and a_ctg except for dup purging.

chklopp commented 3 years ago

I did not. I merged hap1 and hap2.

lh3 commented 3 years ago

Do you have Hi-C or parental data? If Hi-C, you may wait for the next release.

chklopp commented 3 years ago

It is Hi-C...I will wait...

chhylp123 commented 3 years ago

Please have a try with the new version: https://github.com/chhylp123/hifiasm/releases/tag/0.15. Both Hi-C and non-Hi-C modules should be able to output two balanced haplotypes.