chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
540 stars 87 forks source link

Too many contigs and confusing genome size #299

Open Lillian-21 opened 2 years ago

Lillian-21 commented 2 years ago

Hi,

Thanks for you nice hifiasm!

I have a haploid plant sample (wu just guess from previous researches and experience). The WGS estimatics size is ~2.1G with the first kmer peak, the hifi data estimatics size is ~500Mb by genoscope2 with ploidy 2. The kmer frequency results are as follows.

WGS_kmer GenomeScope_kmer

So, it is so confusing about the genome size and ploidy. Could you give me some advise?

I have test many times with different parameters, including with "-l0" or not, with hic data or not, modify -s -D, but all the results are fragment, incomplete and high depetition. The following id my assembly results (*p_ctg.fa) and log files.

Assembly results

hifasm.sh.e531543.txt hifasm.sh.e1188977.txt

How can I improve my assembly? and should I use the "-l0" or not? Looking forward with your reply. Best wishes!

chhylp123 commented 2 years ago

I guess this one is not a haploid genome. I recommend you check a few duplicate genes to see if they are real.

Lillian-21 commented 2 years ago

Thanks for your reply.

Yes, we also doubted with the ploidy. Becuase the contigs is too fragment , I only can plot the synteny comparison with two gene sets. From the results, we can find obvious duplication.

MCscan
zxgsy520 commented 2 years ago

This is likely to be polyploid, and the three peaks have an obvious fold relationship.

Lillian-21 commented 2 years ago

Thanks. Yes, I think so it is a polyploid (triploidy? ) now. I drawed the haplotype structure with heterozygous kmer pairs by Smudgeplots, the results show clear triploidy. And I also have a try to anchor the hifiasm's results to chromosomes with HIC reads, but the results were not ideal. It's obviously to find two parts, and I also don't know what the white area (without anchor? ) is between the two parts.

Whatever, I want to get a high quality haploid assembly results. Could all of you give me some suggestions?

image 221bbd42548be45535075a733708432