chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
501 stars 84 forks source link

High Duplication in my Assembly...l3 (purge-dup) doesn't help much #663

Open rolibudhwar opened 3 weeks ago

rolibudhwar commented 3 weeks ago

Hi, I am using hifiasm with parameter hifiasm -o ./Hifiasm3_l3/Gon --primary -t20 --purge-max -l3 -s 0.35 --h1 ./HIC/Raw_Data/filtered_Insect_S118_L004_R1_001.fastq.gz --h2 ./HIC/Raw_Data/filtered_Insect_S118_L004_R2_001.fastq.gz 0_Raw-Data/HIFI-Raw/m84119_NLG_CON_M.bc2021.fastq.gz

I am expecting duplication as this is mixed pool of same insect. I am using l3 parameter for aggressive purging alongwith lower -s parameter as my assembly size is 4 time high compared to expected size.

As a results of this assembly size is still high ~700 Mb(expected is 150-200Mb) and size hap2 is 1/2 of hap1. high duplication observed in my Busco score. C:99.9%[S:16.4%,D:83.5%],F:0.0%,M:0.1%,n:1367,E:1.0%
1366 Complete BUSCOs (C) (of which 13 contain internal stop codons)
224 Complete and single-copy BUSCOs (S)
1142 Complete and duplicated BUSCOs (D)
0 Fragmented BUSCOs (F)
1 Missing BUSCOs (M)
1367 Total BUSCO groups searched

Please guide how to proceed further... Thanks

chhylp123 commented 2 weeks ago

If this is a diploid sample with 200MB haploid size, the final assembly should be at most 400Mb in size. However, hifiasm produced 700Mb assembly. I feel like some other issues might happen.