dfguan / purge_dups

haplotypic duplication identification tool
MIT License
202 stars 19 forks source link

Purge_dups output vs assembly output #52

Open B10inform opened 3 years ago

B10inform commented 3 years ago

Hi,

I have a question of purge dups output.

image

The left graph is a genome assembled with pbipa, assembly size is 657Mb. The right graph is after purge_dups of the same assembly, assembly size of 371Mb.

My genome size is ~380Mb. The left genome assembly is close to double my genome size but the right genome is close to my genome size. Do you think purge_dups is not making the assembly any better, instead worsening. Can I move further with the right assembly?

Thanks

dfguan commented 3 years ago

What do you mean by this? From which aspect?

Do you think purge_dups is not making the assembly any better, instead worsening"

At least it decreased your genome size to the expected assembly size. A few haplotypic regions remains. Sorry, not sure how to remove the remaining haplotypic duplications.

B10inform commented 3 years ago

Hi dfguan,

It has decreased my genome size close to expected size. But the "0X" coverage has increased after purge_dups. Is this the expected (normal) purge_dups output for heterozygous genome? Can this output be used for downstream analysis?

Thanks