chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
555 stars 88 forks source link

Larger size of genome assembly than what expected #102

Open Karimi-81 opened 3 years ago

Karimi-81 commented 3 years ago

Hi There, I used the HiFi reads (~20x) to assemble a genome of an animal species. The output assembly is of high quality with N50 of 30 Mb. However, the genome size (2.6 G) is a bit larger than what we estimated using short reads (2.4 G). The former assembly of this species had also a size of 2.4 Gb, so I wonder if the size of my genome assembly is correct and if I require to do any additional step to correct that? I also used the Hi-C data along with ccs reads to generate two separate haplotypes of assembly, but the results are strange and the size of each haplotypes is ~ 3.2 Gb. Do you have any idea what are the reason behind this? Finally, despite having the high quality of assembly using hifi reads (N50 of 30 Mb), when I used the Hi-C data (using 3D-DNA) to achieve a chromosome-level assembly, the final output is more fragmented and the number of scaffolds increased while the size of them were shorter? Is this related to long reads technology? do you have any suggestion to scaffold assemblies generated by long reads?

lh3 commented 3 years ago

v0.15 was released yesterday. Try that. Its Hi-C module is much improved. As to the assembly size, hifiasm tends to be more accurate. The real genome size of a human female is 3.05Gb. Hifiasm gets this number but older assemblers often reach a size of 2.8-2.9Gb. Also, don't trust genomescope. It is not reliable.