Open badplantgeek opened 1 year ago
Could you please also show the size of hap2?
Hi Haoyu
Thanks for your quick reply. I don't have the hap2 size for all, but I have for a few. See below.
hap1 | hap 2 | |||
---|---|---|---|---|
primary | l1 | s 0.2 | 2658241726 | 5493102674 |
primary | l1 | n-hap 4 | 4241695685 | 6832817032 |
primary | l1 | s 0.1 | 2814782205 | 5254045602 |
primary | l1 | D 10 | 4483659495 | 6838537694 |
primary | l2 | s 0.1 | 6422772254 | 2152964394 |
primary | l3 | s 0.1, hg-size 2.8g | 5216611048 | 2415842962 |
primary | l3 | s 0.2, hg-size 2.8g | 5382089003 | 2243043410 |
primary | l3 | s 0.3, hg-size 2.8g | 2117668233 | 5543717930 |
The assemblies of both haplotypes are much larger than 2.8g*2. How do you know the estimated genome size? I am wondering if there are containments as the k-mer plot of your data is also a little bit weird. A good plot should like this: https://github.com/chhylp123/hifiasm/issues/10#issuecomment-616213684.
I agree that the k-mer plot is not ideal. We estimated the genome size with flow cytometry, so I am pretty certain it is around 2.8g.
If there is contamination, I would imagine it is from the same species, perhaps from two different individuals. Since the heterozygosity is high in this species, hifiasm is assembling contigs separately that are in reality the same region of the genome. Is there a parameter that I can use to "relax this merging"? I looked into purge_dups and thought about relaxing the -a parameter. Any thoughts? Any suggestion would be much appreciated!
Well, we haven't seen this case before. Probably run purge_dups to get a clean reference , and then map hap1/hap2 for debugging? It is not easy for hifiasm itself to handle this issue automatically.
Hi. I have been trying to assemble the genome of a plant species that has a high rate of heterozygosity. The genome size is around 2.8g, and I have about 40X coverage of HiFi reads and HiC.
I tried over 30 combinations of hifiasm parameters and I can't get the contiguity up, number of contigs down, with the correct genome size. Here is the k-mer plot output from hifiasm l3_run.txt.
The table below shows the parameters I tested so far. Is there anything that I am missing? Any suggestions on how I could improve this assembly? Any help would be much appreciated!