Open Lillian-21 opened 3 years ago
Yeah, seems a tetraploid assembly to me. Sorry but purge_dups is only able to handle diploid assemblies. As for the six numbers, the first is for the junk contigs, the contigs which has very low coverages l
, the second and third number was intended to be used for haploid coverage, but not being used now, the forth number is the cutoff for diploid d
, the last number is for the repeatitive sequences h
, contigs who has an average coverage less than d
but greater than l
are considered to be haplotigs, may be removed if they have paired sequences, contigs with an average coverage greater than d
but less than h
are considered diploid sequences, contigs with an average coverage above h
are considered to be highly repetitive sequences, will be removed from primary contig set, but user can also choose to keep them. Best, Dengfeng.
OK, I got it. I will try other methods. Thank you for your reply.
Hi Dr. Guan,
I am running purge_dups on a plant genome with more than 30% duplication. The estimated size is 375M, and the assembly size is 450M. Attached are coverage pictures with Nanopore reads. Is it polyploidy (maybe 4)? ?? I am so confused about the "cutoffs" file 6 numbers. Can you give some advise for cutoffs?