chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
528 stars 86 forks source link

Two haplotypes assembled from HIC data show a significant difference in size #426

Open weirdo-onlooker opened 1 year ago

weirdo-onlooker commented 1 year ago

I executed the following command using hifiasm 0.19.2 :

hifiasm -o OUTPUT.prefix -t 44 --s-base -1 --h1 HIC.R1.fastq.gz --h2 HIC.R2.fastq.gz input.HIFI.fasta

and the two haplotypes that were assembled have a large difference in size. .hic.hap1.p_ctg.gfa 730M .hic.hap2.p_ctg.gfa 2.5G The total size of the two haplotypes assembled is equivalent to the size of the two haplotypes assembled using HIFI data, but the sizes of the other two haplotypes are similar. .bp.hap1.p_ctg.gfa 1.7G .bp.hap2.p_ctg.gfa 1.8G

What's the reason?

chhylp123 commented 1 year ago

I guess the reason might be the hom coverage thresold inferred by hifiasm itself is wrong. See: https://hifiasm.readthedocs.io/en/latest/faq.html#why-one-hi-c-integrated-assembly-is-larger-than-another-one and https://hifiasm.readthedocs.io/en/latest/faq.html#why-one-hi-c-integrated-assembly-is-larger-than-another-one.

weirdo-onlooker commented 1 year ago

After reviewing my log file, I found that the hom_peak is 38 and the hom_cov is 37. What does this mean? This is my log file. hhh_hifiasm_hic_hifi_assembly_20230320.log

chhylp123 commented 1 year ago

What is the coverage of your sample in comparsion to the haploid size? hom_peak should be equal to the hom_cov and the coverage. If your coverage is about right, probably you could also have a try to set smaller value for -s and --s-base, which are helpful to identify more divergent homologous regions.

chhylp123 commented 1 year ago

What is the coverage of your sample in comparsion to the haploid size? hom_peak should be equal to the hom_cov and the coverage. If your coverage is about right, probably you could also have a try to set smaller value for -s and --s-base, which are helpful to identify more divergent homologous regions.