Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
85 stars 13 forks source link

AUC confidence interval #10

Closed andresimi closed 5 years ago

andresimi commented 5 years ago

Hi, is there a way of calculating the confidence interval of AUC with cutpointr? Thank you

Thie1e commented 5 years ago

Hi, yes, you can. We tend to favor the use of nonparametric methods, so you can calculate confidence intervals for the AUC (and all other metrics) using the bootstrap routine. You get basic intervals using the summary function and you can work with the $boot column manually. Since the AUC is a sample statistic that doesn't depend on the cutpoint estimation, we use the in-bag values in the AUC_b column of the bootstrap results.

library(cutpointr)
library(tidyverse)
oc <- cutpointr(data = suicide, x = dsi, class = suicide, subgroup = gender,
                boot_runs = 1000, break_ties = mean)
#> Assuming the positive class is yes
#> Assuming the positive class has higher x values
#> Running bootstrap...

## Example 1:
summary(oc)
#> Method: maximize_metric 
#> Predictor: dsi 
#> Outcome: suicide 
#> Direction: >= 
#> Subgroups: female, male 
#> Nr. of bootstraps: 1000 
#> 
#> Subgroup: female 
#> --------------------------------------------------------------------------- 
#>  optimal_cutpoint sum_sens_spec    acc sensitivity specificity    AUC
#>                 2        1.8081 0.8852      0.9259      0.8822 0.9446
#>  n_pos n_neg
#>     27   365
#> 
#> Cutpoint 2:
#>           observation
#> prediction yes  no
#>        yes  25  43
#>        no    2 322
#> 
#> 
#> Predictor summary: 
#>  Min. 5% 1st Qu. Median   Mean 3rd Qu. 95% Max.     SD
#>     0  0       0      0 0.8393       1   5   10 1.7452
#> 
#> Predictor summary per class: 
#>     Min.  5% 1st Qu. Median   Mean 3rd Qu. 95% Max     SD
#> no     0 0.0       0      0 0.5479       0   4  10 1.3181
#> yes    0 1.3       4      5 4.7778       6   7   9 2.0444
#> 
#> Bootstrap summary: 
#>           Variable   Min.     5% 1st Qu. Median   Mean 3rd Qu.    95%
#>   optimal_cutpoint 1.0000 1.0000  2.0000 2.0000 2.1540  2.0000 4.0000
#>              AUC_b 0.8446 0.9023  0.9315 0.9497 0.9458  0.9643 0.9777
#>            AUC_oob 0.7935 0.8848  0.9163 0.9514 0.9420  0.9695 0.9845
#>    sum_sens_spec_b 1.6244 1.7210  1.7838 1.8194 1.8157  1.8516 1.8904
#>  sum_sens_spec_oob 1.3064 1.6051  1.7231 1.7796 1.7739  1.8540 1.9041
#>              acc_b 0.7628 0.8162  0.8750 0.8878 0.8851  0.9005 0.9260
#>            acc_oob 0.7338 0.8200  0.8676 0.8844 0.8807  0.9014 0.9231
#>      sensitivity_b 0.7692 0.8499  0.9062 0.9375 0.9342  0.9655 1.0000
#>    sensitivity_oob 0.4286 0.6921  0.8462 0.9000 0.8941  1.0000 1.0000
#>      specificity_b 0.7513 0.8059  0.8707 0.8840 0.8815  0.8986 0.9244
#>    specificity_oob 0.7154 0.8045  0.8632 0.8828 0.8798  0.9025 0.9286
#>            kappa_b 0.2101 0.3435  0.4353 0.4847 0.4812  0.5352 0.5971
#>          kappa_oob 0.1035 0.2860  0.3945 0.4589 0.4533  0.5211 0.6037
#>    Max.     SD
#>  4.0000 0.6803
#>  0.9882 0.0235
#>  0.9991 0.0335
#>  1.9412 0.0510
#>  1.9470 0.0948
#>  0.9541 0.0290
#>  0.9496 0.0322
#>  1.0000 0.0472
#>  1.0000 0.1015
#>  0.9533 0.0315
#>  0.9580 0.0367
#>  0.7261 0.0790
#>  0.7112 0.0948
#> 
#> Subgroup: male 
#> --------------------------------------------------------------------------- 
#>  optimal_cutpoint sum_sens_spec    acc sensitivity specificity    AUC
#>                 3        1.6251 0.8429      0.7778      0.8473 0.8617
#>  n_pos n_neg
#>      9   131
#> 
#> Cutpoint 3:
#>           observation
#> prediction yes  no
#>        yes   7  20
#>        no    2 111
#> 
#> 
#> Predictor summary: 
#>  Min. 5% 1st Qu. Median Mean 3rd Qu. 95% Max.     SD
#>     0  0       0      0 1.15       1   6   11 2.1151
#> 
#> Predictor summary per class: 
#>     Min.  5% 1st Qu. Median   Mean 3rd Qu.  95% Max     SD
#> no     0 0.0       0      0 0.8702       1  5.0   6 1.6286
#> yes    0 0.4       3      4 5.2222       8 10.6  11 3.8333
#> 
#> Bootstrap summary: 
#>           Variable    Min.      5% 1st Qu. Median   Mean 3rd Qu.    95%
#>   optimal_cutpoint  1.0000  1.0000  2.0000 3.0000 2.9785  4.0000 6.0000
#>              AUC_b  0.5381  0.7287  0.8176 0.8673 0.8614  0.9163 0.9630
#>            AUC_oob  0.3333  0.6749  0.7999 0.8894 0.8663  0.9440 0.9965
#>    sum_sens_spec_b  1.1504  1.4392  1.5877 1.6693 1.6649  1.7462 1.8722
#>  sum_sens_spec_oob  0.7925  0.9771  1.3333 1.5064 1.4879  1.6667 1.8680
#>              acc_b  0.6000  0.6714  0.7786 0.8500 0.8266  0.8786 0.9571
#>            acc_oob  0.5556  0.6481  0.7678 0.8302 0.8101  0.8679 0.9153
#>      sensitivity_b  0.2857  0.5833  0.7500 0.8571 0.8389  1.0000 1.0000
#>    sensitivity_oob  0.0000  0.0000  0.5000 0.6667 0.6662  1.0000 1.0000
#>      specificity_b  0.5769  0.6616  0.7710 0.8496 0.8260  0.8798 0.9695
#>    specificity_oob  0.5556  0.6327  0.7660 0.8438 0.8217  0.8873 0.9608
#>            kappa_b  0.0594  0.1354  0.2337 0.3272 0.3365  0.4143 0.5895
#>          kappa_oob -0.0833 -0.0325  0.1521 0.2354 0.2409  0.3285 0.4632
#>     Max.     SD
#>  11.0000 1.5067
#>   1.0000 0.0727
#>   1.0000 0.1019
#>   2.0000 0.1287
#>   1.9412 0.2552
#>   1.0000 0.0797
#>   0.9615 0.0820
#>   1.0000 0.1355
#>   1.0000 0.2915
#>   1.0000 0.0878
#>   1.0000 0.0986
#>   1.0000 0.1399
#>   0.7273 0.1369

## Example 2:
quantile(oc$boot[[1]]$AUC_b, probs = c(0.01, 0.05, 0.95, 0.99))
#>        1%        5%       95%       99% 
#> 0.8829067 0.9022733 0.9776727 0.9836081

## Example 3:
map2(oc$subgroup, oc$boot, function(g, b) {
    l <- list(summary(b$AUC_b))
    names(l) <- g
    return(l)
})
#> [[1]]
#> [[1]]$female
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.8446  0.9315  0.9497  0.9458  0.9643  0.9882 
#> 
#> 
#> [[2]]
#> [[2]]$male
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.5381  0.8176  0.8673  0.8614  0.9163  1.0000

Created on 2018-11-28 by the reprex package (v0.2.1)

andresimi commented 5 years ago

Thank you again!

Thie1e commented 5 years ago

You're welcome, thanks for reporting issues.