dfguan / purge_dups

haplotypic duplication identification tool
MIT License
205 stars 20 forks source link

less effective #117

Open naturalstay opened 2 years ago

naturalstay commented 2 years ago

Hello Dr. Guan, I used hifiasm to assemble a plant genome of about 2.5G with an estimated heterozygosity of 0.9%. The BUSCO is C:98.7%[S:93.4%,D:5.3%],F:0.6%,M:0.7%,n:1614. Warning when running calcuts PB.stat > cutoffs 2 > calcults.log command:

[W::calcuts] mean is not significantly different with peak, please recheck the cutoffs

Below is the image generated by hist_plot.py. zW-d-3aLf9dnsVKR-4FdHq1j8PuYMBTmk6zPwhVaP5g

The cutoffs shows 5 11 17 22 36 66. It prompted me that I might need to manually set the cutoff value. Then, I run calcuts -l 4 -m 20 -u 70 PB.stat. After that, The BUSCO is C:97.4%[S:92.1%,D:5.3%],F:0.6%,M:2.0%,n:1614. The D did not change, but C decreased slightly. I don't know why this is. Do you have any good advice. Thanks. Looking forward to your reply.