Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
84 stars 13 forks source link

Inconsistent calculations using manual method #23

Closed hayleynbarnes closed 4 years ago

hayleynbarnes commented 4 years ago

Hi

I am using the manual method to set specific cutpoints Whilst the sensitivity and specificity changes with each manual outpoint, the AUC does not Why isn't the AUC changing with each outpoint? (Unless I have done something wrong?!)

thank you


> 
> > cp <- cutpointr(data, test, status, method = maximize_metric, metric = sum_sens_spec)
> Assuming the positive class is 1
> Assuming the positive class has higher x values
> > cp
>  A tibble: 1 x 16
>   direction optimal_cutpoint method          sum_sens_spec      acc sensitivity specificity      AUC pos_class neg_class prevalence
>   <chr>                <dbl> <chr>                   <dbl>    <dbl>       <dbl>       <dbl>    <dbl>     <dbl>     <dbl>      <dbl>
> 1 >=                    21.3 maximize_metric       1.33777 0.670951    0.664894    0.672881 0.701632         1         0   0.241645
>   outcome predictor           data roc_curve           boot 
>   <chr>   <chr>     <list<df[,2]>> <list>              <lgl>
> 1 status  test          [778 × 2] <tibble [157 × 10]> NA   
> > opt_cut_manual <- cutpointr(data, test, status, method = oc_manual, cutpoint = 21, boot_runs = 30)
> Assuming the positive class is 1
> Assuming the positive class has higher x values
> Running bootstrap...
> >                             
> > cp
>  A tibble: 1 x 16
>   direction optimal_cutpoint method          sum_sens_spec      acc sensitivity specificity      AUC pos_class neg_class prevalence
>   <chr>                <dbl> <chr>                   <dbl>    <dbl>       <dbl>       <dbl>    <dbl>     <dbl>     <dbl>      <dbl>
> 1 >=                    21.3 maximize_metric       1.33777 0.670951    0.664894    0.672881 0.701632         1         0   0.241645
>   outcome predictor           data roc_curve           boot 
>   <chr>   <chr>     <list<df[,2]>> <list>              <lgl>
> 1 status  test          [778 × 2] <tibble [157 × 10]> NA   
> > opt_cut_manual
>  A tibble: 1 x 16
>   direction optimal_cutpoint method    sum_sens_spec      acc sensitivity specificity      AUC pos_class neg_class prevalence outcome
>   <chr>                <dbl> <chr>             <dbl>    <dbl>       <dbl>       <dbl>    <dbl>     <dbl>     <dbl>      <dbl> <chr>  
> 1 >=                      21 oc_manual       1.33655 0.664524    0.675532    0.661017 0.701632         1         0   0.241645 status 
>   predictor           data      roc_curve boot              
>   <chr>     <list<df[,2]>> <list<df[,9]>> <list>            
> 1 test          [778 × 2]      [157 × 9] <tibble [30 × 23]>
> > opt_cut_manual <- cutpointr(data, test, status, method = oc_manual, cutpoint = 20, boot_runs = 30)
> Assuming the positive class is 1
> Assuming the positive class has higher x values
> Running bootstrap...
> > opt_cut_manual
>  A tibble: 1 x 16
>   direction optimal_cutpoint method    sum_sens_spec      acc sensitivity specificity      AUC pos_class neg_class prevalence outcome
>   <chr>                <dbl> <chr>             <dbl>    <dbl>       <dbl>       <dbl>    <dbl>     <dbl>     <dbl>      <dbl> <chr>  
> 1 >=                      20 oc_manual       1.32322 0.651671    0.680851    0.642373 0.701632         1         0   0.241645 status 
>   predictor           data      roc_curve boot              
>   <chr>     <list<df[,2]>> <list<df[,9]>> <list>            
> 1 test          [778 × 2]      [157 × 9] <tibble [30 × 23]>
> > opt_cut_manual <- cutpointr(data, test, status, method = oc_manual, cutpoint = 30, boot_runs = 30)
> Assuming the positive class is 1
> Assuming the positive class has higher x values
> Running bootstrap...
> > opt_cut_manual
>  A tibble: 1 x 16
>   direction optimal_cutpoint method    sum_sens_spec      acc sensitivity specificity      AUC pos_class neg_class prevalence outcome
>   <chr>                <dbl> <chr>             <dbl>    <dbl>       <dbl>       <dbl>    <dbl>     <dbl>     <dbl>      <dbl> <chr>  
> 1 >=                      30 oc_manual       1.32584 0.722365    0.547872    0.777966 0.701632         1         0   0.241645 status 
>   predictor           data      roc_curve boot              
>   <chr>     <list<df[,2]>> <list<df[,9]>> <list>            
> 1 test          [778 × 2]      [157 × 9] <tibble [30 × 23]>
> > opt_cut_manual <- cutpointr(data, test, status, method = oc_manual, cutpoint = 40, boot_runs = 30)
> Assuming the positive class is 1
> Assuming the positive class has higher x values
> Running bootstrap...
> > opt_cut_manual
>  A tibble: 1 x 16
>   direction optimal_cutpoint method    sum_sens_spec      acc sensitivity specificity      AUC pos_class neg_class prevalence outcome
>   <chr>                <dbl> <chr>             <dbl>    <dbl>       <dbl>       <dbl>    <dbl>     <dbl>     <dbl>      <dbl> <chr>  
> 1 >=                      40 oc_manual       1.29017 0.755784    0.430851    0.859322 0.701632         1         0   0.241645 status 
>   predictor           data      roc_curve boot              
>   <chr>     <list<df[,2]>> <list<df[,9]>> <list>            
> 1 test          [778 × 2]      [157 × 9] <tibble [30 × 23]>
> > opt_cut_manual <- cutpointr(data, test, status, method = oc_manual, cutpoint = 40)
> Assuming the positive class is 1
> Assuming the positive class has higher x values
> > opt_cut_manual
>  A tibble: 1 x 16
>   direction optimal_cutpoint method    sum_sens_spec      acc sensitivity specificity      AUC pos_class neg_class prevalence outcome
>   <chr>                <dbl> <chr>             <dbl>    <dbl>       <dbl>       <dbl>    <dbl>     <dbl>     <dbl>      <dbl> <chr>  
> 1 >=                      40 oc_manual       1.29017 0.755784    0.430851    0.859322 0.701632         1         0   0.241645 status 
>   predictor           data      roc_curve boot 
>   <chr>     <list<df[,2]>> <list<df[,9]>> <lgl>
> 1 test          [778 × 2]      [157 × 9] NA   
gerhi commented 4 years ago

Hi Hayley,

AUC does not change with the cut point because this metric is independent of a specific cutpoint that is beeind developed. See e.g. here for a short description.

https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

Best Gerrit

hayleynbarnes commented 4 years ago

Dear Gerrit,

Thank you so much for your quick reply

Now that you point it out that does make sense, and I can see why the AUC wouldn’t change.

Apologies for not getting it earlier!

Hayley

On Nov 12, 2019, at 11:17 PM, Gerrit Hirschfeld notifications@github.com wrote:

 Hi Hayley,

AUC does not change with the cut point because this metric is independent of a specific cutpoint that is beeind developed. See e.g. here for a short description.

https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

Best Gerrit

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

Thie1e commented 4 years ago

Hi, that's correct. You can additionally do plot_roc(cp). The ROC curve will stay the same (and thus also the area under that curve), only the cutpoint on the curve will change depending on the method.