IndrajeetPatil / ggstatsplot

Enhancing {ggplot2} plots with statistical analysis 📊📣
https://indrajeetpatil.github.io/ggstatsplot/
GNU General Public License v3.0
2.03k stars 190 forks source link

Revisiting #169 #231

Closed ibecav closed 5 years ago

ibecav commented 5 years ago

I made it a new issue because it's a nuance rather than a complete problem. The current solution of pulling from DescTools::CramerV works but does not allow for specifying the the probabilities the user wants in the GoF test. It only does equal splits, e.g. for mtcars$am it only tests the hypothesis the split is 50/50. That's not supporting the ratio parameter in ggpiestats. So I took the code and cleaned, formatted, stripped off unnecessary components and added some basic documentation. Not sure where you'll want to name it, park it or even whether you'll want to adopt it so rather than a PR here's the code and a reprex...

#' @title Custom function to compute Cramer's V and confidence intervals for
#'   a chi squared goodness of fit test using non central parameters.
#' @name CramerVV
#' @description A customized version of CramerV from DescTools to allow
#'   for probabilities to be specified. Called from `helpers_ggpiestats_subtitle`
#'   after the appropriate `x` has been made into a `table`
#' @author Chuck Powell
#'
#' @param x a table of counts
#' @param conf.level confidence level of the interval around V
#' @param method string defining the method to calculate confidence intervals
#'   for Cramer's V. Either "ncchisq" (using noncentral chisquare), or
#'   the adjusted version "ncchisqadj" with degrees of freedom added to both
#'   the nuemrator and denominator.  Default is "ncchisq".
#' @param p (passed to `chisq.test`) a vector of probabilities of the same
#'   length of x. p is rescaled (if necessary and it is possible) to sum to 1.
#'   An error is given if any entry of p is negative. Default is equal counts
#'   per category (factor level)
#' @value a numeric vector with 3 elements for the estimate, the lower and
#'   the upper confidence interval
#'
#' @examples
#'
#' # to get reproducible results from bootstrapping
#' set.seed(123)
#' library(ggstatsplot)
#'
#' # simple function call with the defaults
#' x <- table(mtcars$am)
#' CramerVV(x)
#'
#' # raise confidence intervals specify proportions
#' CramerVV(x, conf.level = .999, p = c(.6, .4))
#'
#' # notice order matters and it follows the levels of `x`
#' # in this case am = 0 is before am = 1, so...
#' # CramerVV(x, conf.level = .9, p = c(4, 6))
#'
#'
CramerVV <- function (x,
                      conf.level = .95,
                      method = c("ncchisq",
                                 "ncchisqadj"),
                      p = rep(1/length(x), length(x))
                      ) {

  # Make sure we were input a table
  if (!class(x) %in% c("table")) {
    # turn off pairwise comparisons
    stop("Input x must be a table")
  }

  # ------------------------- sub functions ----------------------------

  lochi <- function(chival, df, conf) {
    ulim <- 1 - (1 - conf)/2
    lc <- c(0.001, chival/2, chival)
    while (pchisq(chival, df, lc[1]) < ulim) {
      if (pchisq(chival, df) < ulim)
        return(c(0, pchisq(chival, df)))
      lc <- c(lc[1]/4, lc[1], lc[3])
    }
    diff <- 1
    while (diff > 1e-05) {
      if (pchisq(chival, df, lc[2]) < ulim)
        lc <- c(lc[1], (lc[1] + lc[2])/2, lc[2])
      else lc <- c(lc[2], (lc[2] + lc[3])/2, lc[3])
      diff <- abs(pchisq(chival, df, lc[2]) - ulim)
      ucdf <- pchisq(chival, df, lc[2])
    }
    c(lc[2], ucdf)
  }

  hichi <- function(chival, df, conf) {
    uc <- c(chival, 2 * chival, 3 * chival)
    llim <- (1 - conf)/2
    while (pchisq(chival, df, uc[1]) < llim) {
      uc <- c(uc[1]/4, uc[1], uc[3])
    }
    while (pchisq(chival, df, uc[3]) > llim) {
      uc <- c(uc[1], uc[3], uc[3] + chival)
    }
    diff <- 1
    while (diff > 1e-05) {
      if (pchisq(chival, df, uc[2]) < llim)
        uc <- c(uc[1], (uc[1] + uc[2])/2, uc[2])
      else uc <- c(uc[2], (uc[2] + uc[3])/2, uc[3])
      diff <- abs(pchisq(chival, df, uc[2]) - llim)
      lcdf <- pchisq(chival, df, uc[2])
    }
    c(uc[2], lcdf)
  }

  # --------------- initial chisq and V estimates ---------------------------

  chisq.hat <- suppressWarnings(
    chisq.test(x,
               correct = FALSE,
               p = p,
               rescale.p = TRUE)$statistic)

  df <- prod(dim(x) - 1)
  n <- sum(x)
  v <- as.numeric(sqrt(chisq.hat/(n * (min(dim(x)) - 1))))

  # --------------- adjusted or unadjusted ------------------------

  switch(
    match.arg(method),
    ncchisq = {
      ci <- c(lochi(chisq.hat, df, conf.level)[1],
              hichi(chisq.hat, df, conf.level)[1])
      ci <- unname(sqrt((ci) / (n * (min(dim(x)) - 1))))
    },
    ncchisqadj = {
      ci <- c(lochi(chisq.hat, df, conf.level)[1] + df,
              hichi(chisq.hat, df, conf.level)[1] + df)
      ci <- unname(sqrt((ci) / (n * (min(dim(x)) - 1))))
    }
  )

  results <- c(
    `Cramer V` = v,
    lwr.ci = max(0, ci[1]),
    upr.ci = min(1, ci[2])
  )

  # --------------- return results ---------------------------

  return(results)
}
source("/Users/cpowell/Documents/RProjects/ggstatsplot_aux/newcramerv.R")
x <- table(mtcars$cyl)
DescTools::CramerV(x, conf.level = .95)
#>  Cramer V    lwr.ci    upr.ci 
#> 0.1900863 0.0000000 0.4065023
CramerVV(x)
#>  Cramer V    lwr.ci    upr.ci 
#> 0.1900863 0.0000000 0.4065023
# there's no way in DescTools::CramerV to specify probs
# here's a small shift from the actuals
CramerVV(x, p = c(13, 6, 12))
#>  Cramer V    lwr.ci    upr.ci 
#> 0.1083399 0.0000000 0.3072906
# here's a big shift and specify ci
CramerVV(x, conf.level = .99 , p = c(.2, .7, .1))
#>  Cramer V    lwr.ci    upr.ci 
#> 0.8869166 0.5538061 1.0000000
# I actually count't find anything on the adjustment
# it's not Yates.  I dropped Fisher completely since we
# know it's definitely non-central
CramerVV(x, 
         conf.level = .99, 
         method = "ncchisqadj" , 
         p = c(.2, .7, .1)
         )
#>  Cramer V    lwr.ci    upr.ci 
#> 0.8869166 0.5813357 1.0000000

Created on 2019-06-07 by the reprex package (v0.3.0)

IndrajeetPatil commented 5 years ago

This is great work, Chuck! I always forget that we support ratio argument in ggpiestats and ggbarstats functions.

Can you make a PR?

P.S. Maybe the argument p can be renamed to ratio for consistency?

ibecav commented 5 years ago

Sure likely Monday what do you want to call the function and what helper file?

Sent from my mobile please forgive my brevity

On Jun 8, 2019, at 09:21, Indrajeet Patil notifications@github.com wrote:

This is great work, Chuck! I always forget that we support ratio argument in ggpiestats and ggbarstats functions.

Can you make a PR?

P.S. Maybe the argument p can be renamed to ratio for consistency?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

IndrajeetPatil commented 5 years ago

Let's call this something like cramer_v_ci?

This will go in the following file: https://github.com/IndrajeetPatil/ggstatsplot/blob/master/R/helpers_ggcatstats.R

The tests can be added to a separate file test_cramer_v_ci.

ibecav commented 5 years ago

Submitting PR