dfguan / purge_dups

haplotypic duplication identification tool
MIT License
209 stars 21 forks source link

Checking my purged assembly #84

Open aureliendejode opened 3 years ago

aureliendejode commented 3 years ago

Hi, I used purge_dups to clean my CANU assembly made with PacBio CLR. My expected genome size is 1.35 GB and het is 1.5% according to Genomescope analysis. The CANU assembly is 2.2G and after running purge_dups it is now 1.3G. The number of contigs went from 23 755 to 12 212 The N50 went from 190 099 to 286 626 And the BUSCO scores for metazoan went from |C:95.7%[S:23.7%,D:72.0%],F:1.5%,M:2.8%,n:954 to |C:95.1%[S:91.3%,D:3.8%],F:1.9%,M:3.0%,n:954 This all looks very good to me.

I was following your recommendations to validate the purged assembly. Here is the histogram with the automatically determined cutoffs. What do you think about this ? PB cov

To further validate the purged assembly, would recommend to use KAT or KMC ? I have the CLR PACBio reads and also Illumina reads (from a different individual though)

I just ran KAT with Illumina reads on the pruged assembly and here is my graph. kat-spectra-cn

aureliendejode commented 3 years ago

Hi,

@dfguan do you have any recommendations about those purging cutoffs ?

Cheers

Aurélien