StoreyLab / qvalue

R package to estimate q-values and false discovery rate quantities.
112 stars 36 forks source link

pi0est returns error if there is no p-value in the top lambda interval #17

Closed jh-206 closed 5 years ago

jh-206 commented 5 years ago

The function breaks on lines 29-30:

pi0 <- cumsum(tabulate(findInterval(p, vec = lambda))[ind])/(length(p) * 
      (1 - lambda[ind]))

tabulate returns a vector of length equal to the largest bin number of lambda that has a value. Therefore, if no value appears in the highest interval of lambda, which by default would be over .95, then tabulate returns a shorter vector than ind. NA's are then generated by extracting elements with [ind].

Example, modifying the code example from the qvalue function documentation:

data(hedenfalk)
p <- hedenfalk$p
qvalue(p[-which(p > .95)])

Error in smooth.spline(lambda, pi0, df = smooth.df) : missing or infinite values in inputs are not allowed

ajbass commented 5 years ago

If there are not many p-values then a reliable estimate of pi0 is very difficult and you should set pi0=1 in the q-value function. If there are a lot of p-values then I can only imagine this happening if (1) pi0 is nearly 0 or (2) p-value truncation is happening. For (1), you can try changing the lambda argument. For (2), we developed qvalue_trunc for truncated p-values:

qvalue_truncp(p[-which(p > .95)])

In a future update, I will add an informative error message.

jh-206 commented 5 years ago

This doesn't have to do with how many p-values there are, as this will still throw the error:

qvalue(runif(10e5, 0, .94999))

I guess the issue is that a Uniform(0,1) distribution is assumed. I don't have a theoretical background in this area, so I'll leave it up to you as to how to deal with situations where that distributional assumption is violated

ajbass commented 5 years ago

Here's a good post on this topic from a former lab member:

http://varianceexplained.org/statistics/interpreting-pvalue-histogram/