PacificBiosciences / pb-CpG-tools

Collection of tools for the analysis of CpG data
BSD 3-Clause Clear License
62 stars 5 forks source link

genetic variants on CpG sites #55

Open qsonehara opened 11 months ago

qsonehara commented 11 months ago

Hi,

I'm wondering how the genetic variants on CpG sites are treated by aligned_bam_to_cpg_scores. For example, when a diploid genome has a heterozygous SNP on a CpG site, how will the coverage and modified/unmodified site counts in output files be affected?

Best, Kyuto

ctsa commented 11 months ago

First note that if the heterozygous SNP creates a CpG that isn't present in the reference, then you'll only see output for that site when --modsites-mode is set to the denovo option.

When output is generated for a heterozygous SNP site, I believe the current logic will give the non-CpG reads a methylation probability of zero, and count them towards the unmodified coverage.

qsonehara commented 11 months ago

Thanks for your reply!

To my understanding, if one has a heterozygous SNP on a fully methylated CpG site, the modification probability will be evaluated as ~0.5. Such a site needs attention in interpretation, especially when the interest is in the effects of epigenetic regulation. I think it could be helpful if the counts of non-CpG reads were shown for each site in the output.

ctsa commented 11 months ago

Thanks @qsonehara , I think this is a good suggestion for us to have as an option, we can leave this as an feature ticket and see if it can be added in a future update.