Closed lcolladotor closed 3 years ago
https://rdrr.io/cran/matrixStats/man/rowSds.html
This is a bit more complicated since it we'll have to do for every feature level like in #8. So RPKMs for genes and exons, RP80M (or was it RP10M?) for exon-exon junctions and TPMs for transcripts.
It's probably worth running levene test or another test of differential variation comparing an equal number of controls from both brain regions. Maybe on log rpkm scale and then on the cleaningY output of full model (sans Dx)?
On Thu, Apr 8, 2021, 2:34 PM Leonardo Collado-Torres < @.***> wrote:
https://rdrr.io/cran/matrixStats/man/rowSds.html
This is a bit more complicated since it we'll have to do for every feature level like in #8 https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/issues/8. So RPKMs for genes and exons, RP80M (or was it RP10M?) for exon-exon junctions and TPMs for transcripts.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/issues/13#issuecomment-816048295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEYY2ENSJHXSKKVQ4P3DBLTHXZLLANCNFSM42TMKJSQ .
I like the idea of regressing out all the variability from the terms (covariates) we adjust for in the models, keeping only the Dx.
I didn't remember the levene test https://en.wikipedia.org/wiki/Levene%27s_test, so that's a good idea =). Looks like car::leveneTest()
can do the job http://www.sthda.com/english/wiki/compare-multiple-sample-variances-in-r.
So, for every gene, extract expr
from cleaningY(log2(rpkm + 1), mod = joint_model, P = 3)
where we keep the Dx
and Region
effects.
then use:
library(car)
leveneTest(expr ~ Region * Dx, data = data_for_one_gene)
## Extract the F-value and P-value from the leveneTest() output
also keep the results from the Region and Dx only runs
leveneTest(expr ~ Region, data = data_for_one_gene)
leveneTest(expr ~ Dx, data = data_for_one_gene)
FYI I just finished editing my previous comment
/users/ajaffe/Lieber/lieber_functions_aj.R #1938 leveneFast()
On Tue, Apr 20, 2021 at 2:11 PM Leonardo Collado-Torres < @.***> wrote:
FYI I just finished editing my previous comment
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/issues/13#issuecomment-823493615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEYY2ALFPZ7Z3WNSQVLR2DTJW7VHANCNFSM42TMKJSQ .
OHHH, thanks!
leveneFast <- function(dat, groups) {
require(matrixStats)
N = length(groups)
k = length(unique(groups))
gIndexes = split(seq(along=groups),groups)
gLens = as.numeric(sapply(gIndexes,length))
zij = dat
for(i in seq(along=gIndexes)) {
ind = gIndexes[[i]]
zij[,ind] = abs(dat[,ind] - rowMedians(dat[,ind]))
}
zdotdot = rowMeans(zij)
zidot = sapply(gIndexes, function(x) rowMeans(zij[,x]))
num = rowSums(gLens*(zidot-zdotdot)^2)
out = zidot
for(i in seq(along=gIndexes)) {
ind = gIndexes[[i]]
out[,i] = rowSums((zij[,ind] - zidot[,i])^2)
}
denom = rowSums(out)
W = ((N-k)/(k-1))*(num/denom)
pval = pf(W, df1 = k-1, df2 = N-k, lower.tail = FALSE)
return(data.frame(cbind(W,pval)))
}
Louise, I see that groups
would need to be a factor of length equal to the number of columns of dat
(so 1 value per sample). So for Dx * Region
it can be done with groups = factor(paste0(Dx, "_", Region))
or something like that. dat
would be the output from cleaningY()
. You'll need to have matrixStats
installed too.
Here I think that the simplest plot to make is a scatter plot (like in #8) between the
rowSds(RPKM)
computed for each region.genefilter
andmatrixStats
implement that function.