dfguan / purge_dups

haplotypic duplication identification tool
MIT License
202 stars 19 forks source link

polyploidy cutoffs #62

Open Lillian-21 opened 3 years ago

Lillian-21 commented 3 years ago

Hi Dr. Guan,

I am running purge_dups on a plant genome with more than 30% duplication. The estimated size is 375M, and the assembly size is 450M. Attached are coverage pictures with Nanopore reads. Is it polyploidy (maybe 4)? ?? I am so confused about the "cutoffs" file 6 numbers. Can you give some advise for cutoffs?

image

dfguan commented 3 years ago

Yeah, seems a tetraploid assembly to me. Sorry but purge_dups is only able to handle diploid assemblies. As for the six numbers, the first is for the junk contigs, the contigs which has very low coverages l, the second and third number was intended to be used for haploid coverage, but not being used now, the forth number is the cutoff for diploid d, the last number is for the repeatitive sequences h, contigs who has an average coverage less than d but greater than l are considered to be haplotigs, may be removed if they have paired sequences, contigs with an average coverage greater than d but less than h are considered diploid sequences, contigs with an average coverage above h are considered to be highly repetitive sequences, will be removed from primary contig set, but user can also choose to keep them. Best, Dengfeng.

Lillian-21 commented 3 years ago

OK, I got it. I will try other methods. Thank you for your reply.