Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
86 stars 13 forks source link

plot_sensitivity_specificity not ploting #8

Closed andresimi closed 6 years ago

andresimi commented 6 years ago

Something is wrong with the plot_sensitivity_specificity command. When I do this I get an empty ggplot estructure with a vertical line on the number 2.

Is this correct?

library(cutpointr)
data(suicide)
opt_cut <- cutpointr(suicide, dsi, suicide)
plot_sensitivity_specificity(opt_cut)

Thanx

andresimi commented 6 years ago

I discovered what is missing: is the argument boot_runs

opt_cut <- cutpointr(suicide, dsi, suicide,boot_runs = 50)
plot_sensitivity_specificity(opt_cut)

As I extracted this example from the documentation, maybe it needs to be corrected. Thanx!

Thie1e commented 6 years ago

Hi, thanks for the report. I'll look into it tomorrow. It should actually run with and without the boot_runs argument.

Thie1e commented 6 years ago

This was a bug in plot_sensitivity_specificity. I pushed a new version of cutpointr (0.7.4) to Github that should fix the problem. You can install it using devtools::install_github("thie1e/cutpointr"). (By the way, it also includes the add_metric function.) Can you give the new version a try?

andresimi commented 6 years ago

Sure, I tested and it worked beautifully. I have another suggestion for plot_sensitivity_specificity. It would be nice if we could build an output with diverses outcomes in one graph only. Maybe using facet_wrap from ggplot2.

Thanks for the wonderfull package!

Thie1e commented 6 years ago

Good to hear that it works now and thanks again for the report. The fix will make it to CRAN soon.

Regarding the graphing suggestion: You mean plotting different subgroups on different facets instead of with different shapes / colors? So basically like this:

library(tidyverse)
library(cutpointr)
cp <- cutpointr(suicide, dsi, suicide, gender)
cp %>% 
    select(subgroup, roc_curve) %>% 
    unnest %>% 
    ggplot(aes(x = fpr, y = tpr)) +
    geom_step() +
    facet_wrap(~ subgroup) + 
    ggtitle("ROC curve")
andresimi commented 6 years ago

Hummm, actually, if it is possible, I was trying to find a way of doing the graphic below in facet_wrap style. I think this way it is too polluted.

# librarys
library(tidyverse); library(cutpointr)

# random data
data <- data_frame(test = runif(1000, min=0, max=10),
                   out1 = c(rep(1,400),rep(0,600)),
                   out2 = c(rep(1,200),rep(0,800)),
                   out3 = c(rep(1,100),rep(0,900)),
                   out4 = c(rep(1,500),rep(0,500))) %>% 
  sample_n(300) %>% 
  mutate_at(vars(starts_with("out")),factor)
data
#> # A tibble: 300 x 5
#>     test out1  out2  out3  out4 
#>    <dbl> <fct> <fct> <fct> <fct>
#>  1  1.02 0     0     0     0    
#>  2  3.64 0     0     0     0    
#>  3  5.27 0     0     0     0    
#>  4  4.62 0     0     0     0    
#>  5  4.16 0     0     0     0    
#>  6  2.88 1     1     0     1    
#>  7  4.40 1     1     1     1    
#>  8  6.46 0     0     0     0    
#>  9  3.82 1     1     0     1    
#> 10  4.04 0     0     0     0    
#> # ... with 290 more rows

# loop and plots
out <- c("out1","out2","out3","out4")
plot <- out %>%
  map(~cutpointr(data = data, x = test, class = pull(data, .), pos_class = 1, neg_class = 0, method = oc_youden_normal,  metric = ppv) %>%
        plot_sensitivity_specificity +
        geom_vline(xintercept = c(1,5, 7.5), col="dodgerblue2") +
        xlab(paste("cutpoint for",.))) %>% 
  set_names(out)
#> Assuming the positive class has lower x values
#> Assuming the positive class has lower x values
#> Assuming the positive class has lower x values
#> Assuming the positive class has higher x values
#> Warning in method(data = dat$data[[1]], x = predictor, class = outcome, :
#> Cutpoint -3.40934500415604 was restricted to range of independent variable

cowplot::plot_grid(plot$out1, plot$out2, plot$out3, plot$out4)

Created on 2018-08-25 by the reprex package (v0.2.0).

Thie1e commented 6 years ago

You can do it manually as below. The problem is IMO that data is not tidy which makes it harder (gather it first and treat the out columns as subgroups).

Actually, this might also be a good addition to the package. We could just offer the option to make facets instead of different shapes for the subgroups - but this would take some work. I'll open an issue.

library(cutpointr); library(tidyverse)
data <- data_frame(test = runif(1000, min=0, max=10),
                   out1 = c(rep(1,400),rep(0,600)),
                   out2 = c(rep(1,200),rep(0,800)),
                   out3 = c(rep(1,100),rep(0,900)),
                   out4 = c(rep(1,500),rep(0,500))) %>% 
    sample_n(300) %>% 
    mutate_at(vars(starts_with("out")),factor)

# 
# This would be the "cutpointr way" of doing it. Gather the data first.
# 
cp <- data %>% 
    gather(key, value, -test) %>% 
    cutpointr(test, value, key) 
#> Assuming the positive class is 1
#> Assuming the positive class has higher x values
plot_sensitivity_specificity(cp)


# Facet wrap:
res_unnested <- cp %>% 
    select(roc_curve, subgroup, optimal_cutpoint) %>% 
    unnest(roc_curve) %>% 
    mutate(Sensitivity = sensitivity(tp = tp, fn = fn),
           Specificity = specificity(fp = fp, tn = tn)) %>% 
    gather(metric, value, Sensitivity, Specificity) %>% 
    filter(is.finite(x.sorted))  
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
ggplot(res_unnested, aes(x = x.sorted, y = value, color = metric)) + 
    geom_line() +
    geom_vline(aes(xintercept = optimal_cutpoint)) +
    facet_wrap(~subgroup)

Created on 2018-09-03 by the reprex package (v0.2.0).

andresimi commented 6 years ago

Perfect!!