Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
85 stars 13 forks source link

How to include more than one predictors? #56

Closed jwang-lilly closed 2 years ago

jwang-lilly commented 2 years ago

Hi Christian,

Does cutpointr support more than one predictors? In this dataset, for example, could I use dsi + age + gender as predictors together (not individually)? Then the cutpoint won't be the most optimal dsi but the probability for assigning the positive class.

` opt_cut <- cutpointr(data = suicide, x = dsi, class = suicide, pos_class = 'yes', neg_class = 'no', direction = '>=', boot_runs = 100)

`

@Thie1e

Thie1e commented 2 years ago

Hi Jian,

no, you can't do that directly with cutpointr. You would have to estimate a separate model first, for example a logistic regression.

jwang-lilly commented 2 years ago

Thanks much!

michael-mazzucco commented 1 year ago

I also has this question, if I were to create a glm how would I feed this into cutpointr? This package makes it so clear what the ideal cutoff point should be compared to when I independently plot the ROC curve of the glm I can see optimal sens/spec but can't figure out how that translates to variable thresholds ie dsi and age, Any direction is appreciated, thank you!

Thie1e commented 1 year ago

Hi,

what I had in mind was something like the following. You would get an optimal cutpoint for the predicted probabilities from the GLM then, not for dsi and age separately.

library(cutpointr)
library(ggplot2)

# Example GLM
mod <- glm(formula = suicide ~ dsi + age, data = suicide, family = "binomial")
summary(mod)

pred_glm <- predict(mod, type = "response") # in-sample predictions

mydata <- suicide
mydata$pred_glm <- pred_glm

ggplot(mydata, aes(x = suicide, y = pred_glm)) + geom_boxplot()

oc <- cutpointr(data = mydata, 
                x = pred_glm, 
                class = suicide, 
                method = maximize_boot_metric, 
                boot_cut = 1000,
                metric = sum_sens_spec)

oc

plot_roc(oc)
michael-mazzucco commented 1 year ago

ahhh i see, I was thinking more along the lines of more at this point the dsi is # and age is # kind of like a double cutoff. thank you for the explanation and the code, this package is amazing!