dfguan / purge_dups

haplotypic duplication identification tool
MIT License
205 stars 20 forks source link

cutoff meanings & histogram #93

Open charlesfeigin opened 3 years ago

charlesfeigin commented 3 years ago

Hi,

I am trying to use purge_dups on a recent nanopore-based mammal genome assembly which I found to be very complete (93% complete BUSCOs) but also to contain large numbers of complete duplicates (11%, which would be a very unusual pattern in this lineage) and a larger-than-expected genome size for its lineage (which otherwise has a very conserved genome size).

I ran through the default pipeline, but very little purging was achieved. From looking through the 'issues' it seems that I should manually change my cutoffs, however I can' not find a clear description of what each of the 6 numbers are in the cutoff file, or how information in the histogram can be used to change each of these cutoffs.

I've attached my histogram here. Any assistance would be greatly appreciated. PB cov

qianqianqian0717 commented 2 years ago

Hello, I have a very similar problem, the insect genome assembled with Nanopore data also has a high heterozygote and the genome size is much larger than expected, but when I'm classifying contigs, the "dups_bed" file is empty. Have you solved this problem and how? Best

charlesfeigin commented 2 years ago

@qianqianqian0717 unfortunately I haven't gotten any help on this question....

charlesfeigin commented 1 year ago

Again, this would be really helpful.