Closed swvanderlaan closed 6 years ago
What’s the maximum p value? This usually happens when you have p values not between 0 and 1 (for example, your p values could be between 0 and .5)
Oh. That could easily be the case. I'm doing an expression quantitative trait loci (eQTL) analysis of many genes - and for some genes the results could be highly significant, such that most p-values are approaching 0, and few if any are larger than 0.5.
Is this error similar in origin as the following?
Error in smooth.spline(lambda, pi0, df = smooth.df) : missing or infinite values in inputs are not allowed
What is you advice? Perhaps in such cases, I should not apply this particular form of FDR correction as results are highly significant?
Thanks!
It's hard to give useful advice without analyzing the data but the q-value procedure only works for well behaved p-values (see http://varianceexplained.org/statistics/interpreting-pvalue-histogram/) .... If a majority of your p-values are highly significant, maybe the best strategy is to fix pi0=1
(this will apply the BH procedure and is conservative):
q.out <- qvalue(pvalues, pi0 = 1)
Here's a demo:
pvalues <- runif(1000, min = 0.0001, max = 0.01)
q.out <- qvalue(pvalues, pi0 = 1)
Cheers
I understand. Thanks for the advice!
This issue seems to occur even if all P values are valid numbers between 0 and 1. I read on another error that it seems to occur for "truncated" P-value distributions. Is qvalue making some assumptions about P-value distribution? Can it be applied to any P values vector regardless of which test generated them (e.g. Bayesian t-test, permutations test, hypergeometric test rather than the classic Student's t-test)?
Can it be applied to any P values vector regardless of which test generated them: Yes. The link above explains in more detail about how your p-value histogram should look for qvalue to work.
Can you show a reproducible example?
Hi, so I was scrutinizing the faulty P values vector, and I may have an explanation. Most of my P values are 0. The deeper reason is I thought I was mapping proteins to KEGG pathway terms, but I was actually mapping to KEGG terms which cover just closely related proteins - like all proteoforms of a particular enzyme. Since this is a proteomics experiment, and I am dealing with protein groups (which cover essentially the same thing - related proteins) and basing my counts on all proteins in each group, not just the leading protein, for most terms the count in my observed proteome was the same or very close to that for my parent proteome: as soon as I identify peptides for a protein linked to a term, all related protein IDs are listed in the protein group, hence I capture all instances of the term in the parent database. Need to fix this code...
Hi @ajbass,
I tried what you recommended above to get the Hochberg procedure. Just to doublecheck I also used p.adjust
. I assumed that
qvalue(pvalues, pi0 = 1)$qvalue
p.adjust(pvalues, method = "hochberg")
yield the same results but they differ. Do you know why and whether one method is more reliable than the other?
To use BH procedure in p.adjust
, you need to set method = "BH"
:
library(qvalue)
pvals <- runif(1000)
qvals <- qvalue(pvals, pi0 = 1)
qvals2 <- p.adjust(pvals, method = "BH")
all.equal(qvals$qvalues, qvals2)
Hi,
I'm using your package, but I get the error below. The smallest p-value is
e-80
. How can I correct this error?Thanks!