Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
85 stars 13 forks source link

Multiple cutoffs in a loop #28

Closed jgarces02 closed 4 years ago

jgarces02 commented 4 years ago

Hi @Thie1e,

I hope you're doing well. I'm trying to run cutpointr inside a loop and because rlang problems I totally locked...

Try 1: it seems that's because i is considered as a string

for(i in c("var1", "var2", "var3"){
  cutpointr(data = mydata, x = i, class = myclass, pos_class = 1, metric = sum_sens_spec)
}

Error in if (stats::median(x[class != pos_class]) < stats::median(x[class ==  : 
  missing value where TRUE/FALSE needed

Try 2: I force i to be the original variable....

for(i in c("var1", "var2", "var3"){
  cutpointr(data = mydata, x = eval(parse(text = i)), class = myclass, pos_class = 1, metric = sum_sens_spec)
}

Error: Can't convert a call to a string
Run `rlang::last_error()` to see where the error occurred.
<error/rlang_error>
Can't convert a call to a string
Backtrace:
 1. cutpointr::cutpointr(...)
 2. cutpointr:::cutpointr.default(...)
 3. rlang::as_name(rlang::enquo(x))
 4. rlang::as_string(x)
 5. rlang:::abort_coercion(x, "a string")
Run `rlang::last_trace()` to see the full context.

How can I solve, or at leas bypass, this problem and run it inside a loop, please? Thanks in advance.

Thie1e commented 4 years ago

Hi,

to make the for-loop work, you should use !! so that the i gets evaluated not as i but as the character strings:

for(i in c("dsi", "age")) {
    print(cutpointr(data = suicide, x = !!i, class = suicide))
}

In this application, maybe multi_cutpointr could be useful. It returns a tibble where the rows are the cutpoints per predictor / subgroup:

library(cutpointr)
multi_cutpointr(suicide, x = c("dsi", "age"), class = suicide, metric = sum_sens_spec, pos_class = "yes")
#> dsi:
#> Assuming the positive class has higher x values
#> age:
#> Assuming the positive class has lower x values
#> # A tibble: 2 x 16
#>   direction optimal_cutpoint method          sum_sens_spec      acc sensitivity
#>   <chr>                <dbl> <chr>                   <dbl>    <dbl>       <dbl>
#> 1 >=                       2 maximize_metric       1.75179 0.864662    0.888889
#> 2 <=                      55 maximize_metric       1.11537 0.199248    0.972222
#>   specificity      AUC pos_class neg_class prevalence outcome predictor
#>         <dbl>    <dbl> <chr>     <fct>          <dbl> <chr>   <chr>    
#> 1    0.862903 0.923779 yes       no         0.0676692 suicide dsi      
#> 2    0.143145 0.525678 yes       no         0.0676692 suicide age      
#>   data               roc_curve          boot 
#>   <list>             <list>             <lgl>
#> 1 <tibble [532 x 2]> <tibble [13 x 10]> NA   
#> 2 <tibble [532 x 2]> <tibble [61 x 10]> NA

Created on 2020-07-10 by the reprex package (v0.3.0)

jgarces02 commented 4 years ago

Perfect! I didn't know a so easy way! Thanks a lot!