chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
542 stars 87 forks source link

HiFi and Nanopore behave differently in 3D DNA (due to different draft assembly sizes) #250

Open Aannaw opened 2 years ago

Aannaw commented 2 years ago

Hello,Professor I have HiFi + HiC, Nanopore + HiC data The draft assembly of HiFi data was finished by Hifiasm, draft assembly ofNanopore data was finished by Nextdenova and the resulting draft assemblies differ by about 800M.(HiFi : 3.3G, Nanopore: 2.5GM) Then I usejuicer + 3ddnato anchor the two draft genomes to scaffold genomes. Finally I got similar anchor results for the two draft genomes: 32 pseud-chromosomeswere anchored and around more than 2.4G was anchored. But the anchor rate of two genomes is quite different: HiFi:2.4G/3.3G=72%;Nanopore:2.4G/2.5G=96%. It seems that the low anchor rate of HiFi draft genome is due to more hifi sequences than nexdenovo and hic sequences.

I have 8 genomes, 2 are HiFi and 6 are Nanopore. The results are consistent with the above. I can now determine that the chromosome size of my species is 2.4~2.5G,because the results for the eight genomes are the same

In the draft assembly, the extra 800M of HiFi is also the part that cannot be anchor on the chromosome. In the heat map, there is no interaction matrix in this part.

What is the extra 800M? HiFi + HiC heatmap hifi Nanopore + HiC heatmap nextdenovo

lh3 commented 2 years ago

Nextdenovo is known to produce smaller assemblies. It gets a 2.9Gb human CHM13 assembly but the real size should be 3.05Gb. Hifiasm gets a 3.04Gb assembly instead, much closer to the truth. Nextdenovo will have more problems given more repetitive genomes.

What is the extra 800M?

Mostly centromeric repeats and some segmental duplications.

Aannaw commented 2 years ago

Hi ~ how should we deal with the extra 800M unanchored to improve anchoring rate (on HiFi)? Can we remove it for subsequent analysis?

Is this because the extra 800M is generated due to HiFi sequencing more precise than Nanopore

Given the nature of CCS sequencing, I suspect it will produce more repeats. But I haven't seen a comparison of how HiFi and Nanopore behave on the same genome