Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
86 stars 13 forks source link

95% confidence intervals instead of getting limits at 5% and 95% in summary of cutpointr #36

Closed Thie1e closed 3 years ago

Thie1e commented 3 years ago

I received the question how to get the 95% confidence intervals instead of getting only the limits at 5% and 95% from the summary output. I'm posting a solution for calculating bootstrap quantiles here for future reference.

library(cutpointr)
library(tidyverse)

cp1 <- cutpointr(suicide$dsi, suicide$suicide, suicide$gender, 
                 boot_runs = 1000, boot_stratify = TRUE,
                 na.rm=TRUE)
#> Assuming the positive class is yes
#> Assuming the positive class has higher x values
#> Running bootstrap...

boot_ci(cp1, acc, in_bag = F, alpha = 0.05) %>% 
  mutate(variable = "acc")
#> # A tibble: 4 x 4
#>   subgroup quantile values variable
#>   <chr>       <dbl>  <dbl> <chr>   
#> 1 female      0.025  0.779 acc     
#> 2 female      0.975  0.931 acc     
#> 3 male        0.025  0.630 acc     
#> 4 male        0.975  0.926 acc

cp1 %>% 
  select(subgroup, boot) %>%
  unnest(boot) %>% 
  dplyr::select(-(TP_b:roc_curve_oob)) %>% 
  pivot_longer(-subgroup) %>% 
  group_by(subgroup, name) %>% 
  summarise(lower_ci = quantile(value, 0.025, na.rm = TRUE),
            upper_ci = quantile(value, 0.975, na.rm = TRUE))
#> `summarise()` regrouping output by 'subgroup' (override with `.groups` argument)
#> # A tibble: 26 x 4
#> # Groups:   subgroup [2]
#>    subgroup name             lower_ci upper_ci
#>    <chr>    <chr>               <dbl>    <dbl>
#>  1 female   acc_b               0.798    0.936
#>  2 female   acc_oob             0.779    0.931
#>  3 female   AUC_b               0.893    0.979
#>  4 female   AUC_oob             0.882    0.988
#>  5 female   cohens_kappa_b      0.324    0.625
#>  6 female   cohens_kappa_oob    0.287    0.605
#>  7 female   optimal_cutpoint    1        4    
#>  8 female   sensitivity_b       0.815    1    
#>  9 female   sensitivity_oob     0.636    1    
#> 10 female   specificity_b       0.786    0.937
#> # ... with 16 more rows

Created on 2020-12-22 by the reprex package (v0.3.0)

jwang-lilly commented 2 years ago

@Thie1e, please consider supporting prediction level in addition to the confidence interval.

The formula and discussion are well described here: https://www.bryanshalloway.com/2021/03/18/intuition-on-uncertainty-of-predictions-introduction-to-prediction-intervals/