Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
84 stars 13 forks source link

Set manual color for only one line #60

Closed jwang-lilly closed 1 year ago

jwang-lilly commented 1 year ago

Dear Christian @Thie1e

I ran cutpointr with subgroup and bootstrap. Then I draw lines corresponding to each subgroup. Because one of the lines is my reference so I'd like to set a color for only this line. To illustrate what I try to achieve, I use the sample dataset as following. As shown, I am able to manually set two colors for gender. However, what I really want is to continue using default color for all lines except my reference line. The reason is that I will have many lines (subgroup) and I'd like to see clearly my reference line among the rest of the lines.

Any advice how I can achive this?

Thanks much.

` library(cutpointr) library(dplyr) library(ggplot2)

data(suicide)

opt_cut <- cutpointr(data=suicide, x=dsi, class=suicide, direction = ">=", pos_class = "yes", neg_class = "no", subgroup = gender, method = maximize_metric, metric = youden, boot_runs = 100) %>% add_metric(list(cohens_kappa))

plot_cutpointr(opt_cut, xvar = cutpoint, yvar = cohens_kappa, conf_lvl = 0.95, aspect_ratio = NULL) + scale_x_continuous(n.breaks=20, minor_breaks = waiver()) + scale_y_continuous(n.breaks=5, minor_breaks = waiver()) + scale_color_manual(values = c("#353436", "#02e302")) + scale_fill_manual(values = c("#353436", "#02e302"))

`

jwang-lilly commented 1 year ago

To follow-up, say I'd like to manually set one subgroup (female) to black.
I did the following but obviously the resulting plot doesn't make much sense. I also don't know what is exactly "cohens_kappa_oob...25" In addition, I have to set conf_lvl = 0, otherwise, I got this error: Error in geom_line(): ! Problem while computing aesthetics. ℹ Error occurred in the 3rd layer. Caused by error in FUN(): ! object 'ymax' not found

` temp <- opt_cut %>% dplyr::filter(subgroup == 'female') %>% data.frame() temp2 <- temp$boot %>% data.frame() %>% dplyr::rename(cohens_kappa = "cohens_kappa_oob...25")

plot_cutpointr(opt_cut, xvar = cutpoint, yvar = cohens_kappa, conf_lvl = 0, aspect_ratio = NULL) + scale_x_continuous(n.breaks=20, minor_breaks = waiver()) + scale_y_continuous(n.breaks=5, minor_breaks = waiver()) + geom_line(data = temp2, mapping = aes(x = optimal_cutpoint, y = cohens_kappa), color = "black")

`

Any advice is greatly appreciated.

Thie1e commented 1 year ago

Hi Jian,

I think your first approach isn't so bad. You just need a function that generates the color palette. I think there are many options to do that in R. Here, I am using rainbow. Then you modify the color of the reference class manually.

Can you try this? It's a bit hacky, but I think it should pick the correct subgroup for the manually set color. Can you confirm that?

data(suicide)

# Generate some additional subgroups
testdat <- suicide
testdat$gender <- sample(1:5, size = nrow(testdat), replace = T)

opt_cut <- cutpointr(data=testdat, x=dsi, class=suicide, direction = ">=", pos_class = "yes",
                     neg_class = "no", subgroup = gender,
                     method = maximize_metric, metric = youden,
                     boot_runs = 100) %>%
    add_metric(list(cohens_kappa))

# Find the position of the reference class and set its color
ref_class <- "2"
(which_ref <- which(sort(opt_cut$subgroup) == ref_class))
color_vec <- rainbow(nrow(opt_cut))
ref_color <- "black"
color_vec[which_ref] <- ref_color
print(color_vec)

# If you do not want to plot the confidence intervals when having many
# subgroups, set conf_lvl = 0
# Otherwise, also define the fill color to use your color palette
p <- plot_cutpointr(opt_cut, xvar = cutpoint,
                    yvar = youden,
                    conf_lvl = 0,
                    aspect_ratio = NULL) +
    geom_line(linewidth = 1) +
    scale_x_continuous(n.breaks=20, minor_breaks = waiver()) +
    scale_y_continuous(n.breaks=5, minor_breaks = waiver()) +
    scale_color_manual(values = color_vec)
print(p)

If you need more control or want to try plotting the data with a manual call to ggplot, you could extract the plotted data with p$data.

Also, thanks for drawing my attention to the problems with add_metric (the wrong column names). I think it doesn't lead to errors in the results, but I will have to look into that. The additional metric column is added multiple times and gets unintended names. I guess a dependency changed and I will have to update that function. Thanks again.

jwang-lilly commented 1 year ago

Thanks very much Christian. This is a perfect solution. I will go ahead to close this issue. One very minor observation with regard the "New names during the run." `

opt_cut <- cutpointr(data=testdat, x=dsi, class=suicide, direction = ">=", pos_class = "yes", neg_class = "no", subgroup = gender,

  • method = maximize_metric, metric = youden,
  • boot_runs = 100) %>%
  • add_metric(list(cohens_kappa))

`

Running bootstrap... New names:New names:New names:New names:New names:`