extend prop.test() to handle multiple methods for computing confidence intervals?

rpruim commented 9 years ago

Here are the args for stats::prop.test():

function (x, n, p = NULL, alternative = c("two.sided", "less", "greater"), 
     conf.level = 0.95, correct = TRUE)

We could add method (or ci.method) with various methods. Possibilities include:

Wald
Agresti-Coull
+ 4 (Agressti-Coul with 1.96 = 2.0 -- throw error if conf.level is not close to .95)
Wilson Score (currently used by stats::prop.test())
binomial (basically pass things off to binom.test)
a Bayesian interval?

Some (all?) of these are done in the binom package, but I'm not a fan of the naming scheme (lots of functions named binom.foo where foo is a method or task. Wald and +4 are implemented in fastR with similarly unfortunate names.

These are familiar enough and simple enough that we could just reimplement.

rpruim commented 9 years ago

Thinking about this a little bit more, here are some other options to the design:

base this off of binom.test() rather than prop.test()

prop.test() is more general and handles other types of tests for which these intervals are not appropriate. binom.test() is only designed for the right use case.
instead of adding functionality to binom.test(), we could write a function (or modify confint.htest()) to extract the type of interval we want.

I think all the necessary information is stored in the "htest" object that is returned by binom.test(). The advantage of this approach is that it is easy to write and doesn't increase the distance between stats::binom.test() and mosaic::binom.test().

rpruim commented 9 years ago

I've created a function that inserts a Wald interval into the object returned from binom.test(). It should be easy to replicate for other methods. Now just need to decide on the API -- extractors or an argument to binom.test()?

> binom.test(30, 40) %>% wald()

    Exact binomial test (with Wald CI)

data:  x and n
number of successes = 30, number of trials = 40, p-value = 0.002221
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.6158104 0.8841896
sample estimates:
probability of success 
                  0.75 

> binom.test(30, 40)

    Exact binomial test

data:  x and n
number of successes = 30, number of trials = 40, p-value = 0.002221
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.5880380 0.8730852
sample estimates:
probability of success 
                  0.75

rpruim commented 9 years ago

Agresti-Coull and Plus 4 are implemented as well:

> binom.test(30, 40) %>% plus4

    Exact binomial test (with Plus 4 CI)

data:  x and n
number of successes = 30, number of trials = 40, p-value = 0.002221
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.5956792 0.8588663
sample estimates:
probability of success 
                  0.75 

> binom.test(30, 40) %>% agresti

    Exact binomial test (with Agresti-Coull CI)

data:  x and n
number of successes = 30, number of trials = 40, p-value = 0.002221
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.5963877 0.8598015
sample estimates:
probability of success 
                  0.75

rpruim commented 9 years ago

I've brought all of these into one function called update_ci() and added more functionality (1-sided tests). Next step is to decide whether to add a ci.method argument to binom.test(). I might be able to fix some sticky non-standard evaluation stuff that has always bothered me at the same time.

> HT <- binom.test( 30, 50, alt="gr", conf.level=.90)
> update_ci(HT, "score") 

    Exact binomial test (with Score CI)

data:  x and n
number of successes = 30, number of trials = 50, p-value = 0.1013
alternative hypothesis: true probability of success is greater than 0.5
90 percent confidence interval:
 0.4994722 1.0000000
sample estimates:
probability of success 
                   0.6 

> update_ci(HT, "wald") 

    Exact binomial test (with Wald CI)

data:  x and n
number of successes = 30, number of trials = 50, p-value = 0.1013
alternative hypothesis: true probability of success is greater than 0.5
90 percent confidence interval:
 0.5112115 1.0000000
sample estimates:
probability of success 
                   0.6 

> update_ci(HT, "agresti") 

    Exact binomial test (with Agresti-Coull CI)

data:  x and n
number of successes = 30, number of trials = 50, p-value = 0.1013
alternative hypothesis: true probability of success is greater than 0.5
90 percent confidence interval:
 0.5093406 1.0000000
sample estimates:
probability of success 
                   0.6 

> update_ci(HT, "plus4") 

    Exact binomial test (with Plus 4 CI)

data:  x and n
number of successes = 30, number of trials = 50, p-value = 0.1013
alternative hypothesis: true probability of success is greater than 0.5
90 percent confidence interval:
 0.5069023 1.0000000
sample estimates:
probability of success 
                   0.6

rpruim commented 9 years ago

We should test my code against some other references to make sure I didn't glum up any of the formulas. Perhaps we can compare against the tools in the binom package, or @nicholasjhorton can run a comparison in SAS.

rpruim commented 9 years ago

I've now folded this all into binom.test:

> do.call(rbind, 
      Map( function(m) interval(binom.test(~sex, data=HELPrct, ci.method=m)), 
      m=c("Score", "W", "agresti", "pl")) )
        probability of success     lower     upper level
Score                0.2362031 0.1978173 0.2780728  0.95
W                    0.2362031 0.1970892 0.2753170  0.95
agresti              0.2362031 0.1993466 0.2774960  0.95
pl                   0.2362031 0.1994390 0.2775850  0.95

ProjectMOSAIC / mosaic

extend prop.test() to handle multiple methods for computing confidence intervals? #449