Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
84 stars 13 forks source link

Missing metrics if maximize/minimize_boot_metric #33

Closed jgarces02 closed 3 years ago

jgarces02 commented 3 years ago

Hi @Thie1e,

I'm here again :sweat:... I'm faceting to an error I never saw before:

> cutpointr(data = u, x = x, class = class, na.rm = T,
+           metric = sum_sens_spec, method = minimize_boot_metric, 
+           boot_cut = 100, boot_stratify = T, boot_runs = 100)
Assuming the positive class is 1
Assuming the positive class has higher x values
Error in optimize_metric(data = data[b_ind, ], x = x, class = class, metric_func = metric_func,  : 
  All metric values are missing.
In addition: Warning message:
In roc(data = data, x = !!x, class = !!class, pos_class = pos_class,  :
  ROC curve contains no positives

And this is my data structure:

structure(list(x = c(0.225, 1.936, 0.0315, 0.0078, 0.4698, 19.35, 
0.0531, 1.7466176, 0, 0.02350828, 0.0714725, 0.5275296, 7.68378, 
0.05376, 0.020688, 0.08143, 1.127828, 0, 0.0313956, NA, 0.04976592, 
30.072, 6.492, 2.99, 2.52, 0.17, 0.03321, 0.3306, 0, 0.884, NA, 
29.7, 1.4, 0, 0, 0.12320784, 1.108, 0.023104, 66.448512, 4.180792, 
0.5792, 0.0444, 0, 0.0392, 0, 0, 0.2105334, 0.225, 3.355, 23.4
), class = c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 50L
), class = "data.frame")

I see that you updated the package version, maybe I'm missing something? Thanks for your help (again)!

jgarces02 commented 3 years ago

Nothing, sorry for disturbing you. The problem was the number of events...

> table(u$class)
 0  1 
47  3 
Thie1e commented 3 years ago

Hi @jgarces02,

no worries, it's not a bad question. The error is, as you've concluded correctly, caused by the low number of positives in combination with bootstrapping, because some bootstrap sets didn't contain any positives.

You've also correctly set boot_stratify = TRUE to avoid that, but that parameter only modifies the "outer" bootstrap, not the bootstrap data sets that are used for cutpoint calculation by maximize_boot_metric. Honestly, the problem here is that we did not also implement a stratification method for the "inner" bootstrap of maximize_boot_metric or minimize_boot_metric.

If the "inner" bootstrap was also stratified, this task should run without any problems. It's of course an edge case with that high class imbalance in combination with the low sample size, but I think we should be able to handle that. I'm going to add this to my to do list.

Best, Christian