GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

why the column sum of TF PPM matrix not equal to 1? #509

Closed Dragonlongzhilin closed 3 years ago

Dragonlongzhilin commented 3 years ago

I used the cisbp database to do motif enrichment analysis. And I checked the PWMatrix and converted it to PPM matrix with the following command: PWMatrixToProbMatrix <- function(x){ if (class(x) != "PWMatrix") stop("x must be a TFBSTools::PWMatrix object") (2^as(x, "matrix"))*TFBSTools::bg(x)/sum(TFBSTools::bg(x)) }

I found the sum of each column not equal to 1. why? I don't know the reason. For example:

$HOXC5 [,1] [,2] [,3] [,4] [,5] [,6] [,7] A 0.2412167 0.20878063 0.50660475 0.57631481 0.08873236 0.1795074 0.3352352 C 0.2614673 0.13468995 0.09202777 0.07315842 0.18141736 0.1830915 0.1412218 G 0.2068719 0.09726351 0.17220854 0.06727825 0.13364746 0.3236295 0.3069469 T 0.2873535 0.48002729 0.13457435 0.11910541 0.50287123 0.2983620 0.1932740 [,8] [,9] A 0.2442228 0.2295669 C 0.2647379 0.2715331 G 0.2345961 0.2295669 T 0.2559793 0.2678980

rcorces commented 3 years ago

I think you may be confusing a position probability matrix (PPM) with a position weight matrix (PWM). https://en.wikipedia.org/wiki/Position_weight_matrix

The PWM logos show that columns dont need to have equal sum:

image

Closing but feel free to comment again if this doesnt answer your question

Dragonlongzhilin commented 3 years ago

Thanks for such a quick reply! I want to use the Tomtom tool to compare some motifs in MEME site. So I need to obtain the PPM matrix. But when I input the PPM converted, there was some error about matrix: image I think it is caused by the sum of column not being 1 in PPM matrix. Also, I comfirmed the PPM matrix based on the link you provided. I found that the sum of each column is 1 Snipaste_2021-01-25_09-13-03

rcorces commented 3 years ago

Sorry - I must have mis-read your original post. It sounds like you are asking why the code you are using to convert PWM to PPM doesnt work? This isnt ArchR code (it looks like it is from ChrAccR - https://github.com/GreenleafLab/ChrAccR/blob/c074d8160d07f1c3725bf8501d033dc3b8c8a2d8/R/utils_motifs.R#L430) so I'm not sure how we can help. Are you saying that the PWM objects in ArchR arent properly formatted?

Dragonlongzhilin commented 3 years ago

Thank you for your reply! This code is provided by jgranja24 (https://github.com/GreenleafLab/ArchR/issues/476). Using the ArchR tool, I found some interesting TFs. So, I want to further analyze these TF and need the PPM matrix. But there is the PWM matrix in ArchR peakAnnotation object. The cisbp database was used to annotated the motifs: addMotifAnnotations(ArchRProj = projRenal6, motifSet = "cisbp", name = "Motif")

how can I get PPM matrix of these TFs with properly formatted?

rcorces commented 3 years ago

Thanks for linking to that previous issue. That clarifies things. I'm not familiar with that code for making a PPM so you'll have to wait for @jgranja24 to weigh in. I have a feeling that Jeff is the original author of that code despite its presence in ChrAccR.

Dragonlongzhilin commented 3 years ago

Thank you for your reply! I don't know how to connect jgranja24? Could you help me?

rcorces commented 3 years ago

you'll just have to wait until he replies here.

jgranja24 commented 3 years ago

Sorry for the delayed response -- I think the code you had before just needed to be in natural log not in log2 see below--


library(ArchR)
library(chromVARmotifs)

data("human_pwms_v1")

PWMs <- human_pwms_v1

PWMatrixToProbMatrix <- function(x){
    if (class(x) != "PWMatrix") stop("x must be a TFBSTools::PWMatrix object")
    (exp(as(x, "matrix"))) * TFBSTools::bg(x)/sum(TFBSTools::bg(x))
}

ProbMatrices <- lapply(PWMs, PWMatrixToProbMatrix)
lapply(ProbMatrices, colSums) %>% range
#[1] 0.9999996 1.0000004

#Maybe we can just tidy this up a tiny bit

PWMatrixToProbMatrix <- function(x){
    if (class(x) != "PWMatrix") stop("x must be a TFBSTools::PWMatrix object")
    m <- (exp(as(x, "matrix"))) * TFBSTools::bg(x)/sum(TFBSTools::bg(x))
    m <- t(t(m)/colSums(m))
    m
}

ProbMatrices <- lapply(PWMs, PWMatrixToProbMatrix)
lapply(ProbMatrices, colSums) %>% range
#[1] 1 1
rcorces commented 3 years ago

This looks solved to me. Closing but feel free to comment again here if you need additional help