AndriSignorell / DescTools

Tools for Descriptive Statistics and Exploratory Data Analysis
http://andrisignorell.github.io/DescTools/
82 stars 18 forks source link

BinomCI() function: confidence intervals bounds are correct, the estimate is not. #106

Closed julienbio99 closed 1 year ago

julienbio99 commented 1 year ago

Hi,

I have used the BinomCI() function of the DescTools package as follow to estimate the 95% CI limits with the method of Agresti & Coull (1998) for a given proportion/probability of 20 occurrences over 253 trials:

library(DescTools) BinomCI(x=20, n=253, conf.level=0.95, method="agresti-coull") est lwr.ci upr.ci [1,] 0.08534732 0.05117782 0.1195168

Note that the estimate (est) is 0.08534732, whereas it should be 20/253 = 0.07905138.

If I use the binom package instead with its binom.confint() function and the same method ("ac" for "agresti-coull"), I get the correct estimate with identical 95%CI limits to those obtained with the BinomCI() function:

library(binom) binom.confint(20, 253, conf.level=0.95, method="ac") method x n mean lower upper 1 agresti-coull 20 253 0.07905138 0.05117782 0.1195168

Is there a reason for the estimate obtained with BinomCI() that differs from the actual proportion we're looking at?

Thanks

And by the way, the DescTools package is very useful, thanks for maintaining it.

Cheers,

Julien

GegznaV commented 1 year ago

@julienbio99, thank you for reporting the issue. I can reproduce it too. Did you try using other methods for CI computation: do they have similar issues of incorrect estimates? The default method gives the correct estimate.


@julienbio99, could you, please, format your code as code in the future by using code blocks, e.g.,

```r

# Your code here


It will be easier to read the message.
More information [here](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks)
julienbio99 commented 1 year ago

@GegznaV,

I just did with nearly all the available methods, including the default one (Wilson), and they all yield the correct estimate. It seems to be an issue that only relates to method="agresti-coull".

And sorry for the codes, it's my first time on GitHub.

Regards,

Julien

GegznaV commented 1 year ago

I just did with nearly all the available methods, including the default one (Wilson), and they all yield the correct estimate. It seems to be an issue that only relates to method="agresti-coull".

Thanks for checking.

it's my first time on GitHub.

You did very well: reproducible examples were included. These are the most important thing. Thank you.

julienbio99 commented 1 year ago

Excellent.

I guess this minor bug will be corrected in the next version of DescTools.

Cheers,

Julien

GegznaV commented 1 year ago

We will have to wait for Andrii to respond if this is a bug or some kind of correction. I looked into calculations, and It was not clear to me: there can be some kind of stats theory behind the calculations.

julienbio99 commented 1 year ago

Hi @GegznaV,

After a look at:

Agresti & Coull (1998)'s paper: https://users.stat.ufl.edu/~aa/articles/agresti_coull_1998.pdf

Brown et al. (2001)'s paper: https://repository.upenn.edu/cgi/viewcontent.cgi?article=1440&context=statistics_papers

and

Wikipedia regarding the method of Agresti & Coull (scroll down a bit the webpage): https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

It seems that "p-hat" is somehow approximated by using a "center-point adjustement" ("p-tilde") for the calculation of the CI following the original approach of Wilson (1927). I'm not sure however that the approximation of "p-hat" should be used as the central estimate afterward. Maybe just for the calculation of the CI.

I will let you, Andrii and the others have a closer look into this and then possibly adjust BinomCI() - if required.

As Brown et al. (2001) recommend the Agresti & Coull (1998)'s method over some other approaches when n > 40, I've personally decided to use this one.

Thanks for the follow-up.

Regards,

Julien

GegznaV commented 1 year ago

If the approximation of the estimate is used, it at least should be mentioned in the documentation that the result is different than using other methods.

julienbio99 commented 1 year ago

Agreed!

Cheers,

Julien

DrJerryTAO commented 1 year ago

Is there any updates on this issue? The internal calculation of BinomCI(method = ) is still

      x.tilde <- x + kappa^2/2
      n.tilde <- n + kappa^2
      p.tilde <- x.tilde/n.tilde
      q.tilde <- 1 - p.tilde
      est <- p.tilde
      term2 <- kappa * sqrt(p.tilde * q.tilde)/sqrt(n.tilde)
      CI.lower <- max(0, p.tilde - term2)
      CI.upper <- min(1, p.tilde + term2)

I tend to agree with @julienbio99 that line 67 might need a change to est <- x/n.

AndriSignorell commented 1 year ago

I'm drowning in work right now. Could someone of you please consult the literature?

Agresti A. and Coull B.A. (1998) Approximate is better than "exact" for interval estimation of binomial proportions. American Statistician, 52, pp. 119-126. Newcombe, R. G. (1998) Two-sided confidence intervals for the single proportion: comparison of seven methods, Statistics in Medicine, 17:857-872 https://pubmed.ncbi.nlm.nih.gov/16206245/

julienbio99 commented 1 year ago

The paper by Brown et al. (2001) indicates at page 108, section 3.1.2: "Both the Agresti–Coull and the Wilson interval are centered on the same value, p-tilde". My understanding is that the transformation of "p-hat" to "p-tilde" is required for the calculation of the CI limits, but shouldn't change the (observed) value of "p" being reported. Using the Wilson method does not change the estimation of "p" in the end with DescTools.

I have also came across examples in SAS comparing different methods and the values reported for "p" were all the same, including those from the Agresti-Coull method. Only the CIs differed, as expected.

When considering too that the binom package reports "p-hat" and thus not "p-tilde" for the Agresti-Coull method, I believe that the ratio between the observed number of occurrences and that of trials should be reported as is (i.e., untransformed).

Hope this may help.

DrJerryTAO commented 1 year ago

Thanks @AndriSignorell for the references, where the Agresti & Coull (1998, p. 124) named their estimator as "adjusted Wald interval" and argued that "One can use the adjusted Wald interval without regarding its midpoint p = (X + 2)/(n + 4) as the preferred point estimate of p. However, this rather strong shrinkage toward .5 might often provide a more appealing estimate than p." They also noted that Wilson (1927) mentioned this shrinkage estimator as a reasonable alternative to the sample pro- portion X/n or the Laplace estimator (X + 1)/(n + 2) in 'As the distribution of chances of an observation is asymmetric, it is perhaps unfair to take the central value as the best estimate of the true probability; but this is what is actually done in practice...Those who make the usual allowance of 2σ for drawing an inference would use'" (X + 2)/(n + 4). And they suggested that "statisticians refer to p = (X + 2)/(n + 4) as the Wilson point estimator of p and refer to the score confidence interval for p as the Wilson method." Here 2 is an approximation for (qnorm(0.975)^2)/2 in a 95% CI. So @julienbio99, there might be a good reason to keep this "discrepancy."

Accordingly, BinomCI() help function documents that "the point estimate for the Agresti-Coull method is slightly larger than for other methods because of the way this interval is calculated." Is it a good idea to use this "shrinkage estimator" of point estimate in the current Agresti-Coull method also as the point estimate associated with Wilson CI?

I don't see that Newcombe (1998) addressed any point-estimate formulae.

julienbio99 commented 1 year ago

Thanks @MrJerryTAO for this follow-up which shed some light on this.

As such, p.tilde will always "tend toward 0.5", such that if p.hat < 0.5, p.tilde will be slightly higher and even more so when far from 0.5. The opposite happens for situations in which p.hat > 0.5: p.tilde will be lower.

Brown et al. (2001) recommend the Agresti-Coull method for situation in which n > 40. Let's use n = 100 trials while varying X (number of successfull events) and see how p.hat is adjusted to p.tilde (p.hat -> p.tilde) when using conf.level=0.95 and once converted into percentage:

1% -> 2.81% 5% -> 6.66% 49% -> 49.04% 50% -> 50.00% ## only situation where p.hat = p.tilde 51% -> 50.96% 75% -> 74.08% 95% -> 93.33% 99% -> 97.19%

I'm a fisheries biologist. Let's say I'm interested to estimate the probability to observe developed gonads for a given female fish length. I end up with a sample of 100 female fish of that length and 3 have developed gonads, so p.hat = 3.00%, but p.tilde = 4.74%. I have thus "analytically" inflated this probability by 158%. Biologically, it corresponds to a (way too) large increase that probably does not reflect what I'm trying to describe with the data at hand (i.e., the probability could have been lower too). But I know for sure that 3 fish out of 100 were "real successes". On the other hand, the uncertainty with n = 100 will be quite large [0.65; 8.83] according to Agresti and Coull (1998)'s method (and other methods) and is however likely to have a decent nominal coverage, so I would certainly use it.

As a possible solution to this, maybe DescTools could provide both p.hat and p.tilde for this method, such that the user can determine which one he uses in the end. Personally, I'm going to stick with p.hat.

AndriSignorell commented 1 year ago

Thanks to all for the research! So we have two options, either stay with the current solution or change the output.
Although the majority is not always right, in this case I'd bow to convention. I have consulted the results table in SAS and STATA, neither reports the p tilde as point estimator. With this in mind, I suggest reporting the usual point estimator for the Agresti interval as well and appending p tilde as an attribut.

What are your opinions?

julienbio99 commented 1 year ago

Hi @AndriSignorell,

What you are proposing appears to me as the best option.

In my mind, we're looking for the best uncertainty estimate for p.hat and using p.tilde for the computation of the CI limits according to Agresti & Coull (1998) seems to offer a pretty good nominal coverage. So I would let p.tilde be used for that only, Providing the value of p.tilde for those who would like to use it for a given statistical need or another is a good idea. Personally, I won't use it!

GegznaV commented 1 year ago

@AndriSignorell, I think a note in the documentation of the Agresti-Coull method could be added about what proportion estimates are returned.

Alternatively, a parameter, e.g., p_tilde (with values TRUE/FALSE, the default value is FALSE; or a possible value agresti-coull) could be used to determine what type of proportion estimate should be returned.

NOTE: I would not appreciate it if different output structures would appear for different CI methods, e.g., if some methods have output columns est, lwr.ci, upr.ci and others have additional est_tilde. This would make programming with DescTools more difficult and each time non-regularities should be worked around. So either always 3 or always 4 columns.

AndriSignorell commented 1 year ago

From my point of view, this issue has been fixed in DescTools 0.99.49 with Vilmantas' solution. I you agree please close the issue.

julienbio99 commented 1 year ago

I agree. I'm closing the issue (I didn't know I had to close it... sorry).