dfguan / purge_dups

haplotypic duplication identification tool
MIT License
205 stars 20 forks source link

Overpurging the haplotigs in Step 4 #43

Open bmansfeld opened 4 years ago

bmansfeld commented 4 years ago

Hey Dengfeng, First let me say thanks for developing purge_dups! It's a great piece of software that seems to really do it's job successfully. We've been working on purging a Falcon-Unzip assemble of a heterozygous plant species and are using merqury (https://github.com/marbl/merqury) to asses the purging process. See our successful purging of primary contigs in the panel below: image

We tried following Step 4 in which we aligned our NGS reads vs the newly purged haplotigs + the original cns_h_ctg.fa from unzip. We then ran purge_dups on this new set of haplotigs to purge out the rest of the duplication (apparent in the middle left panel - blue line at ~180x).

However as you can see above (left bottom panel) when we run the algorithm again the result is a vast over purging of the haplotigs, loosing about 200 out of the 350Mb. The histo and cutoffs look pretty good so I'm not quite sure what went wrong? image

Thanks in advance, Ben

dfguan commented 4 years ago

Hey Ben, very sorry about being so late. It seems to me that there are low depth contigs, caused by highly repetitive plant genome and short reads. It would be good to use long reads for purging. Best, Dengfeng.