PacificBiosciences / pb-CpG-tools

Collection of tools for the analysis of CpG data
BSD 3-Clause Clear License
62 stars 5 forks source link

avg mod score and avg unmod scores #44

Closed olaraym closed 1 year ago

olaraym commented 1 year ago

I am confused as to how these scores were obtained please @pacbiodevnet, @ctsa, and @armintoepfer can you please clarify this for me? Thanks.

ctsa commented 1 year ago

avg mod score / avg unmod score is described as the output of the count pileup mode here:

https://github.com/PacificBiosciences/pb-CpG-tools#output-modes-and-option-details

Specifically:

For a given site, the number of reads with a modification score of >0.5 and <0.5 are counted and the modification probability is given as a percentage.

So for example, 'avg mod score' is the average modification probability of all base with a modification probability >0.5, multiplied by 100 to output the value as a percentage.

olaraym commented 1 year ago

Thanks for the response @ctsa I appreciate it. One more thing which of these values (modification probability or avg modification score) best describes CG site methylation on a particular read and is there any further cutoffs for declaring a site as being methylated aside from the >0.5 or <0.5 used in calculating the average modification and un-modification score. Regards, Laide.

ctsa commented 1 year ago

Re best describes CG site methylation on a particular read, this tool provides consensus of multiple reads at a site. The methylation of a CG site on a single read is encoded in the MM/ML tags of the bam file. One relatively easy way to see these values at each read is to view the BAM in IGV with the 5mC coloration mode turned on.

For the site classification, the primary output of both pileup modes, either model (recommended) or count is the modification score in column 4 of the bed output, which describes the site modification probability expressed as a percentage. In the model pileup mode this value is output from a machine learning model, and in the count pileup mode this values is the proportion of bases at the site classified as modified.

I'll update the readme of this site to increase the level of detail on these topics.

ctsa commented 1 year ago

The discussion above should be updated in the project README. Closing as complete for now.