chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
534 stars 87 forks source link

UL assembly genome size is shorter #447

Open uniquelemon opened 1 year ago

uniquelemon commented 1 year ago

Hello,

I have tried a genome assembly with different parameters using 0.19.5-r587.

Parameters --h1 ${HiC_1} --h2 ${HiC_2} --hom-cov 45 -n 11 -D 10 --hg-size 1.6g output the genome sizes are 1.662Gb and 1.678Gb respectively Parameters --h1 ${HiC_1} --h2 ${HiC_2} --hom-cov 45 -n 11 --ul-tip 30 -D 10 --hg-size 1.6g output the genome sizes output by are 1.581Gb and 1.597Gb, respectively.

I wonder why the genome size assembled using UL is so much smaller, and the higher the --ul-tip, the smaller the genome size acquired .

Let me explain here that this species is an invertebrate. Due to the high sequencing error rate of the species genome itself, the parameter settings are relatively strict. UL reads is CLR sequencing data larger than 10kb.

Thank you very much

chhylp123 commented 1 year ago

I wonder the higher the --ul-tip, the smaller the genome size acquired. Hifiasm will discard tips that are shorter than --ul-tip to make final contigs more contiguous. If that value is too large, too many bases will be removed, resulting in smaller assembly size.

I wonder why the genome size assembled using UL is so much smaller. In addition to the too large --ul-tip, another possibility is that hifiasm assemble throughs more diffcult regions, so that there are significantly less number of small contigs in the final assembly.

uniquelemon commented 1 year ago

Thank you for your answer, I still have a question, if I first filter out the UL reads that can bempletely compared to the hifiasm primary assembly contig, and only use the remaining ul reads assembly, will it affect the assembly result?

chhylp123 commented 1 year ago

It depends on the coverage. If the coverage of the remaining UL reads is too low, the UL intergeration might not work.