Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
84 stars 13 forks source link

Summary and plot don't work well with multi_cutpointr #17

Closed xrobin closed 5 years ago

xrobin commented 5 years ago

Running summary on data from multi_cutpointr throws an error:

multi_cut <- multi_cutpointr(suicide, c("age", "dsi"), "suicide", subgroup="gender")
summary(multi_cut)
[...]
---------------------------------------------------------------------------------------------- 
Error in Math.data.frame(list(optimal_cutpoint = 56, method = "maximize_metric",  : 
  non-numeric variable(s) in data frame: method

The plot function is also behaving weird, and seems to be mixing both x variables in the same plot

plot(multi_cut)

multi_cutpointr

Similar issues happen with bootstrap enabled (boot_runs > 0).

Thie1e commented 5 years ago

Hi, thanks for the report. Actually, multi_cutpointr was just meant as a shortcut to a map over predictor columns. It didn't have a plotting or summary method.

Regarding the plotting, I'm unsure whether there's a way to reasonably plot a large number of results from cutpointr, so for now we'll issue an error, if the user tries to plot a multi_cutpointr object.

A summary method would be a good idea, though. I've added one to the current version on Github, can you try to install it via devtools::install_github("thie1e/cutpointr") and then try summary on multi_cutpointr again?

The only problem is that the summary output becomes very long with large multi_cutpointr objects.

xrobin commented 5 years ago

Hi, thanks for the quick reply!

The summary method works now. I agree the output is very long and not necessarily very useful.

For the plots you might want to map the columns to a different aesthetic like linetype for roc and metrics, and I'm not sure what about the histograms, maybe more faceting vertically? That would only work with a few columns though.

But as long as no incorrect plot is created that's perfectly fine for me.

Thie1e commented 5 years ago

OK, I'm glad that it works now.

Maybe I'm going to experiment with a plotting method for multi_cutpointr, but I'll probably let it keep throwing an error for now. Individual rows can be plotted manually by piping them to ggplot:

multi_cutpointr(suicide, class = "suicide", silent = TRUE) %>% 
    slice(2) %>% 
    select(roc_curve) %>%
    unnest %>%
    ggplot(aes(x = fpr, y = tpr)) + geom_step() 

Personally, when working with the package I found the output of summary often quite useful. Would you have expected something different or some additional information?

xrobin commented 5 years ago

I'm not sure, but just a random idea might be to make it more tabular. For instance the summary could start like this

Method: maximize_metric 
Predictor:           dsi       age
Outcome:         suicide   suicide
Direction:            >=        <=
optimal_cutpoint       2        55
sum_sens_spec     1.7518   1.11537
acc  [etc.]

The pattern will break once you reach the contingency matrix or summary per class, but that could be a start?

Thie1e commented 5 years ago

Right, I also thought about "rectangling" and somehow compressing the output more, for example by replacing the confusion matrix. Thanks for the suggestion. I'm going to experiment with different styles for the output, but for now the previous solution is on CRAN (I had to push a new version because of changes in the new version of tidyr).

Thie1e commented 5 years ago

As of version 1.0.0, we'll keep throwing an error when plotting multi_cutpointr objects, but the output of summary has been compressed a bit by replacing the confusion matrix. I think the rectangular pattern from above is basically covered with print. Thanks for the suggestions.