Open idavydov opened 3 years ago
Interestingly, valType="p.greater"
always returns either 1 or 0. Not sure if that's correct here.
Ok, I think I got it. nBg = nTotal-nInds
, in grows left to right with neg all genes are included, so nBg
is zero.
Then:
res = u1 / nInds / nBg;
Which leads to Inf. Ok; maybe fair enough for "r"
and "f"
.
But for p-values this leads to overly optimistic/pessimistic estimates.Basically, p-values will be always 0 or 1.
This probably means that p-values are "too extreme" in all the cases.
Maybe in this case the correct way would be to compute two different p-values and use something like Fisher's method to aggregate them?
Here's an example. With totally random signature covering 95 genes out of 100, we always get a skewed p-value distibution:
set.seed(42)
ngenes <- 100
nsamples <- 1000
m <- matrix(rnorm(ngenes * nsamples), ncol=nsamples)
rownames(m) <- paste0("g", seq_len(nrow(m)))
colnames(m) <- paste0("s", seq_len(ncol(m)))
# using all the genes except for the last five
big_signature <- sample(c("pos", "neg"), ngenes - 5, replace = TRUE)
random_sign <- BioQC::SignedGenesets(list(
list(
name = "random signature",
pos = paste0("g", which(big_signature == "pos")),
neg = paste0("g", which(big_signature == "neg"))
)
))
hist(BioQC::wmwTest(m, random_sign, valType="p.less"))
Created on 2021-08-12 by the reprex package (v2.0.1)
I think one way to resolve this problem is the following:
Hi @Accio ,
I noticed that
wmwTest()
onSignedGenesets
returnsNaN
s forvalType
s"r"
and"f"
. Not sure if that is intentional.Created on 2021-08-12 by the reprex package (v2.0.1)