chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
517 stars 85 forks source link

Abnormal Kmer Distribution and Lower N50 #155

Open biozhangzhou opened 3 years ago

biozhangzhou commented 3 years ago

Dear Authors: Here is my kmer distribution result from the most recent hifiasm v0.14.5 with default paras. I don't know why would this happen. Also, the N50 is not good either. Please give some tips about it . Screen Shot 2021-07-18 at 5 28 41 PM

chhylp123 commented 3 years ago

Could you please show the plot in low coverage?

biozhangzhou commented 3 years ago

Screen Shot 2021-07-18 at 8 48 19 PM

biozhangzhou commented 3 years ago

Screen Shot 2021-07-18 at 8 48 59 PM

chhylp123 commented 3 years ago

Looks like the plot is not bad. Are you using 0.15.4? What are the exact issues of the assemblies?

biozhangzhou commented 3 years ago

Not bad? :) How can the heter peaks high than homo peaks ? My N50 is bad which only gets 10Mb with ~55x hifi and this species could get much higher.

chhylp123 commented 3 years ago

It is very normal when the heterozygosity rate is high. How do you get much higher N50? Do you also have CLR or ONT assemblies? And which type of hifiasm assemblies are you using?

biozhangzhou commented 3 years ago

I can get a very good N50 using ONT; I have try the default heter-assemble mode and hic-haplotype mode , they get similar N50. Also, I have checked the bionao hybrid results, there are many contigs which are very close to each other. So I don't why would there be a breaks and want to are there any paraments to avoid it ?

chhylp123 commented 3 years ago

What' the N50 of bp.p_ctg.gfa, i.e. primary assembly? And how good it is for ONT assembly? Please note that the hic-haplotype mode generates haplotype-resolved assemblies while ONT assembly should be collapsed I guess?

biozhangzhou commented 3 years ago

The N50 mentioned above are of primary assembly.

chhylp123 commented 3 years ago

So the N50s of both primary assembly and phased assemblies are around 10Mb?

biozhangzhou commented 3 years ago

Sorry. The N50 for primary is 14Mb and for hap1 is 9Mb ,hap2 is 7Mb.

chhylp123 commented 3 years ago

I guess at least from the k-mer plot, the HiFi data look OK. For double checking, you can also assemble it with HiCanu and IPA to see the results. The new version of hifiasm tends to generate primary assembly with smaller N50, but it should be always better as it collapses less segdups than previous hifiasm. If you really care about the N50, you can use hifiasm-0.14.2 (r315) but I personally think it is not worth doing that.

As for ONT assembly, my concern is still about segdups and repeats. I don't know how better it is in terms of N50, but it may collapse much more segdups than HiFi assembly. I do recommend you to check that.

Since you also have Hi-C, I guess Hi-C phased assemblies are always preferred. The differences between phased assemblies and unphased assemblies can be found: https://lh3.github.io/2021/04/17/concepts-in-phased-assemblies. Generally, I guess ONT assembly collapses both segdups and haplotypes, so that it is not such useful for now.

biozhangzhou commented 3 years ago

Thanks!!!