Closed Dragonlongzhilin closed 3 years ago
I think you may be confusing a position probability matrix (PPM) with a position weight matrix (PWM). https://en.wikipedia.org/wiki/Position_weight_matrix
The PWM logos show that columns dont need to have equal sum:
Closing but feel free to comment again if this doesnt answer your question
Thanks for such a quick reply! I want to use the Tomtom tool to compare some motifs in MEME site. So I need to obtain the PPM matrix. But when I input the PPM converted, there was some error about matrix: I think it is caused by the sum of column not being 1 in PPM matrix. Also, I comfirmed the PPM matrix based on the link you provided. I found that the sum of each column is 1
Sorry - I must have mis-read your original post. It sounds like you are asking why the code you are using to convert PWM to PPM doesnt work? This isnt ArchR code (it looks like it is from ChrAccR - https://github.com/GreenleafLab/ChrAccR/blob/c074d8160d07f1c3725bf8501d033dc3b8c8a2d8/R/utils_motifs.R#L430) so I'm not sure how we can help. Are you saying that the PWM objects in ArchR arent properly formatted?
Thank you for your reply! This code is provided by jgranja24 (https://github.com/GreenleafLab/ArchR/issues/476). Using the ArchR tool, I found some interesting TFs. So, I want to further analyze these TF and need the PPM matrix. But there is the PWM matrix in ArchR peakAnnotation object. The cisbp database was used to annotated the motifs: addMotifAnnotations(ArchRProj = projRenal6, motifSet = "cisbp", name = "Motif")
how can I get PPM matrix of these TFs with properly formatted?
Thanks for linking to that previous issue. That clarifies things. I'm not familiar with that code for making a PPM so you'll have to wait for @jgranja24 to weigh in. I have a feeling that Jeff is the original author of that code despite its presence in ChrAccR.
Thank you for your reply! I don't know how to connect jgranja24? Could you help me?
you'll just have to wait until he replies here.
Sorry for the delayed response -- I think the code you had before just needed to be in natural log not in log2 see below--
library(ArchR)
library(chromVARmotifs)
data("human_pwms_v1")
PWMs <- human_pwms_v1
PWMatrixToProbMatrix <- function(x){
if (class(x) != "PWMatrix") stop("x must be a TFBSTools::PWMatrix object")
(exp(as(x, "matrix"))) * TFBSTools::bg(x)/sum(TFBSTools::bg(x))
}
ProbMatrices <- lapply(PWMs, PWMatrixToProbMatrix)
lapply(ProbMatrices, colSums) %>% range
#[1] 0.9999996 1.0000004
#Maybe we can just tidy this up a tiny bit
PWMatrixToProbMatrix <- function(x){
if (class(x) != "PWMatrix") stop("x must be a TFBSTools::PWMatrix object")
m <- (exp(as(x, "matrix"))) * TFBSTools::bg(x)/sum(TFBSTools::bg(x))
m <- t(t(m)/colSums(m))
m
}
ProbMatrices <- lapply(PWMs, PWMatrixToProbMatrix)
lapply(ProbMatrices, colSums) %>% range
#[1] 1 1
This looks solved to me. Closing but feel free to comment again here if you need additional help
I used the cisbp database to do motif enrichment analysis. And I checked the PWMatrix and converted it to PPM matrix with the following command: PWMatrixToProbMatrix <- function(x){ if (class(x) != "PWMatrix") stop("x must be a TFBSTools::PWMatrix object") (2^as(x, "matrix"))*TFBSTools::bg(x)/sum(TFBSTools::bg(x)) }
I found the sum of each column not equal to 1. why? I don't know the reason. For example:
$HOXC5 [,1] [,2] [,3] [,4] [,5] [,6] [,7] A 0.2412167 0.20878063 0.50660475 0.57631481 0.08873236 0.1795074 0.3352352 C 0.2614673 0.13468995 0.09202777 0.07315842 0.18141736 0.1830915 0.1412218 G 0.2068719 0.09726351 0.17220854 0.06727825 0.13364746 0.3236295 0.3069469 T 0.2873535 0.48002729 0.13457435 0.11910541 0.50287123 0.2983620 0.1932740 [,8] [,9] A 0.2442228 0.2295669 C 0.2647379 0.2715331 G 0.2345961 0.2295669 T 0.2559793 0.2678980