dfguan / purge_dups

haplotypic duplication identification tool
MIT License
205 stars 20 forks source link

Optimizing parameters for better purge results #106

Open genmor opened 2 years ago

genmor commented 2 years ago

Hi, I have a hicanu assembly of an animal where we are expecting a final size of ~1.3 Gb. The initial hicanu assembly is approximately 1.9 Gb (BUSCO C:97.7%[S:36.9%,D:60.8%],F:1.0%,M:1.3%,n:3285), and one pass with purge_dups (using the runner script) cuts this down to about 1.7 Gb (BUSCO C:97.1%[S:54.2%,D:42.9%],F:1.2%,M:1.7%,n:3285). I would like to know what parameters should be modified to get better purging (right now I am using just the defaults). For context, when I ran the exact same analysis on a closely-related species, I initially got 2.2 Gb output from hicanu (BUSCO C:97.5%[S:16.0%,D:81.5%],F:1.2%,M:1.3%,n:3285), which was cut down to 1.2 Gb (BUSCO C:96.6%[S:94.1%,D:2.5%],F:1.4%,M:2.0%,n:3285) after purging with purge_dups (again, using runner and default parameters).

Thanks for any insight on this matter.