Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
84 stars 13 forks source link

more than one metric function on the table #7

Closed andresimi closed 6 years ago

andresimi commented 6 years ago

is it possible to report more than one metric function on the columns of the result of cutpointr? Like PPV, NPV, Accuracy? thank you

Thie1e commented 6 years ago

This functionality is not built in, but may be a good addition for a future version.

Of course, you could quite easily write your own implementation, but making that function generally applicable is not quite straightforward. The function for adding metrics should:

Here's a first attempt. The function is suitable for the metric functions that are included in cutpointr, such as ppv, npv, and so on (accuracy is always returned):

add_metric <- function(oc, metric) {
    if (!is.list(metric)) stop("The metric function(s) must be given as a list.")
    require(rlang)
    met <- purrr::map(metric, function(metric_func) {
        # Get numbers of TP, FP, TN, FN at optimal cutpoint(s) from ROC curve(s)
        # and calculate the metric(s). Allow for type instability of the metric func:
        purrr::pmap_df(list(oc$optimal_cutpoint, oc$roc_curve, oc$direction),
                       function(optimal_cutpoint, roc_curve, direction) {
                           opt_ind <- cutpointr:::get_opt_ind(roc_curve = roc_curve,
                                                              oc = optimal_cutpoint,
                                                              direction = direction)
                           met <- metric_func(tp = roc_curve$tp[opt_ind],
                                              fp = roc_curve$fp[opt_ind],
                                              tn = roc_curve$tn[opt_ind],
                                              fn = roc_curve$fn[opt_ind])
                           met_name <- colnames(met)
                           if (is.null(met_name)) met_name <- "added_metric"
                           if (length(met) > 1) met <- list(as.numeric(unlist(met)))
                           tibble::tibble(!!met_name := met)
                       })
    })
    oc <- dplyr::bind_cols(oc, met)
    return(oc)
}

oc <- cutpointr(suicide, dsi, suicide, gender)
oc <- add_metric(oc, list(ppv, npv))
oc %>% select(subgroup, optimal_cutpoint, AUC, acc, ppv, npv)

# A tibble: 2 x 6
  subgroup optimal_cutpoint      AUC      acc      ppv      npv
  <chr>               <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 female                  2 0.944647 0.885204 0.367647 0.993827
2 male                    3 0.861747 0.842857 0.259259 0.982301

Please let me know if this works for you. Thanks.

andresimi commented 6 years ago

Thanks for your answer. Unfortunatelly I was not able to reproduce it.

library(tidyverse); library(rlang); library(cutpointr)
add_metric <- function(oc, metric) {
+     if (!is.list(metric)) stop("The metric function(s) must be given as a list.")
+     require(rlang)
+     met <- purrr::map(metric, function(metric_func) {
+         # Get numbers of TP, FP, TN, FN at optimal cutpoint(s) from ROC curve(s)
+         # and calculate the metric(s). Allow for type instability of the metric func:
+         purrr::pmap_df(list(oc$optimal_cutpoint, oc$roc_curve, oc$direction),
+                        function(optimal_cutpoint, roc_curve, direction) {
+                            opt_ind <- cutpointr:::get_opt_ind(roc_curve = roc_curve,
+                                                               oc = optimal_cutpoint,
+                                                               direction = direction)
+                            met <- metric_func(tp = roc_curve$tp[opt_ind],
+                                               fp = roc_curve$fp[opt_ind],
+                                               tn = roc_curve$tn[opt_ind],
+                                               fn = roc_curve$fn[opt_ind])
+                            met_name <- colnames(met)
+                            if (is.null(met_name)) met_name <- "added_metric"
+                            if (length(met) > 1) met <- list(as.numeric(unlist(met)))
+                            tibble::tibble(!!met_name := met)
+                        })
+     })
+     oc <- dplyr::bind_cols(oc, met)
+     return(oc)
+ }
oc <- cutpointr(suicide, dsi, suicide, gender)
Assuming the positive class is yes
Assuming the positive class has higher x values

oc <- add_metric(oc, lst(ppv, npv))
Erro: Column `ppv` must be a 1d atomic vector or a list 

When I try

oc <- add_metric(oc, lst("ppv", "npv"))
Error in metric_func(tp = roc_curve$tp[opt_ind], fp = roc_curve$fp[opt_ind],  : 
  not possible to find the function "metric_func" 

Maybe I am missing something, but cutpointr do not have the function get_opt_ind (from cutpointr:::get_opt_ind)

Thank your for your help.

Thie1e commented 6 years ago

I see. This seems to be caused by tibble. I was using the dev version from Github. Can you try this modified function? That should hopefully work now (at least with tibble v1.4.2 and cutpointr v0.7.3):

add_metric <- function(oc, metric) {
    if (!is.list(metric)) stop("The metric function(s) must be given as a list.")
    require(rlang)
    met <- purrr::map(metric, function(metric_func) {
        if (!is.function(metric_func)) {
            stop("The list elements of metric have to be functions.")
        }
        # Get numbers of TP, FP, TN, FN at optimal cutpoint(s) from ROC curve(s)
        # and calculate the metric(s). Allow for type instability of the metric func:
        purrr::pmap_df(list(oc$optimal_cutpoint, oc$roc_curve, oc$direction),
                       function(optimal_cutpoint, roc_curve, direction) {
                           opt_ind <- cutpointr:::get_opt_ind(roc_curve = roc_curve,
                                                              oc = optimal_cutpoint,
                                                              direction = direction)
                           met <- metric_func(tp = roc_curve$tp[opt_ind],
                                              fp = roc_curve$fp[opt_ind],
                                              tn = roc_curve$tn[opt_ind],
                                              fn = roc_curve$fn[opt_ind])
                           met_name <- colnames(met)
                           if (length(met_name) > 1) {
                               stop("The metric function should return one column or a vector.")
                           }
                           if (is.null(met_name)) met_name <- "added_metric"
                           if (length(met) > 1) {
                               met <- list(as.numeric(unlist(met)))
                           } else {
                               met <- as.numeric(met)
                           }
                           tibble::tibble(!!met_name := met)
                       })
    })
    oc <- dplyr::bind_cols(oc, met)
    return(oc)
}

cutpointr:::get_opt_ind is a function for returning the row of the ROC curve that contains the optimal cutpoint. It is not exported, so you need the :::.

The functions in metric of add_metric have to be actual functions, not character, as the function tries to apply them to the numbers of TP, FP, TN, and FN. That's consistent with the behavior of cutpointr(). I added a small check.

andresimi commented 6 years ago

That worked! Thank you very much! I think it is a good addition to future versions!