Closed olaraym closed 1 year ago
avg mod score / avg unmod score is described as the output of the count pileup mode here:
https://github.com/PacificBiosciences/pb-CpG-tools#output-modes-and-option-details
Specifically:
For a given site, the number of reads with a modification score of >0.5 and <0.5 are counted and the modification probability is given as a percentage.
So for example, 'avg mod score' is the average modification probability of all base with a modification probability >0.5, multiplied by 100 to output the value as a percentage.
Thanks for the response @ctsa I appreciate it. One more thing which of these values (modification probability or avg modification score) best describes CG site methylation on a particular read and is there any further cutoffs for declaring a site as being methylated aside from the >0.5 or <0.5 used in calculating the average modification and un-modification score. Regards, Laide.
Re best describes CG site methylation on a particular read
, this tool provides consensus of multiple reads at a site. The methylation of a CG site on a single read is encoded in the MM/ML tags of the bam file. One relatively easy way to see these values at each read is to view the BAM in IGV with the 5mC coloration mode turned on.
For the site classification, the primary output of both pileup modes, either model
(recommended) or count
is the modification score in column 4 of the bed output, which describes the site modification probability expressed as a percentage. In the model pileup mode this value is output from a machine learning model, and in the count pileup mode this values is the proportion of bases at the site classified as modified.
I'll update the readme of this site to increase the level of detail on these topics.
The discussion above should be updated in the project README. Closing as complete for now.
I am confused as to how these scores were obtained please @pacbiodevnet, @ctsa, and @armintoepfer can you please clarify this for me? Thanks.