Closed andresimi closed 6 years ago
This functionality is not built in, but may be a good addition for a future version.
Of course, you could quite easily write your own implementation, but making that function generally applicable is not quite straightforward. The function for adding metrics should:
use_midpoints = TRUE
, so if the returned optimal_cutpoint
is not found on the ROC curve roc_curve[[1]]$x.sorted
Here's a first attempt. The function is suitable for the metric functions that are included in cutpointr
, such as ppv
, npv
, and so on (accuracy is always returned):
add_metric <- function(oc, metric) {
if (!is.list(metric)) stop("The metric function(s) must be given as a list.")
require(rlang)
met <- purrr::map(metric, function(metric_func) {
# Get numbers of TP, FP, TN, FN at optimal cutpoint(s) from ROC curve(s)
# and calculate the metric(s). Allow for type instability of the metric func:
purrr::pmap_df(list(oc$optimal_cutpoint, oc$roc_curve, oc$direction),
function(optimal_cutpoint, roc_curve, direction) {
opt_ind <- cutpointr:::get_opt_ind(roc_curve = roc_curve,
oc = optimal_cutpoint,
direction = direction)
met <- metric_func(tp = roc_curve$tp[opt_ind],
fp = roc_curve$fp[opt_ind],
tn = roc_curve$tn[opt_ind],
fn = roc_curve$fn[opt_ind])
met_name <- colnames(met)
if (is.null(met_name)) met_name <- "added_metric"
if (length(met) > 1) met <- list(as.numeric(unlist(met)))
tibble::tibble(!!met_name := met)
})
})
oc <- dplyr::bind_cols(oc, met)
return(oc)
}
oc <- cutpointr(suicide, dsi, suicide, gender)
oc <- add_metric(oc, list(ppv, npv))
oc %>% select(subgroup, optimal_cutpoint, AUC, acc, ppv, npv)
# A tibble: 2 x 6
subgroup optimal_cutpoint AUC acc ppv npv
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 female 2 0.944647 0.885204 0.367647 0.993827
2 male 3 0.861747 0.842857 0.259259 0.982301
Please let me know if this works for you. Thanks.
Thanks for your answer. Unfortunatelly I was not able to reproduce it.
library(tidyverse); library(rlang); library(cutpointr)
add_metric <- function(oc, metric) {
+ if (!is.list(metric)) stop("The metric function(s) must be given as a list.")
+ require(rlang)
+ met <- purrr::map(metric, function(metric_func) {
+ # Get numbers of TP, FP, TN, FN at optimal cutpoint(s) from ROC curve(s)
+ # and calculate the metric(s). Allow for type instability of the metric func:
+ purrr::pmap_df(list(oc$optimal_cutpoint, oc$roc_curve, oc$direction),
+ function(optimal_cutpoint, roc_curve, direction) {
+ opt_ind <- cutpointr:::get_opt_ind(roc_curve = roc_curve,
+ oc = optimal_cutpoint,
+ direction = direction)
+ met <- metric_func(tp = roc_curve$tp[opt_ind],
+ fp = roc_curve$fp[opt_ind],
+ tn = roc_curve$tn[opt_ind],
+ fn = roc_curve$fn[opt_ind])
+ met_name <- colnames(met)
+ if (is.null(met_name)) met_name <- "added_metric"
+ if (length(met) > 1) met <- list(as.numeric(unlist(met)))
+ tibble::tibble(!!met_name := met)
+ })
+ })
+ oc <- dplyr::bind_cols(oc, met)
+ return(oc)
+ }
oc <- cutpointr(suicide, dsi, suicide, gender)
Assuming the positive class is yes
Assuming the positive class has higher x values
oc <- add_metric(oc, lst(ppv, npv))
Erro: Column `ppv` must be a 1d atomic vector or a list
When I try
oc <- add_metric(oc, lst("ppv", "npv"))
Error in metric_func(tp = roc_curve$tp[opt_ind], fp = roc_curve$fp[opt_ind], :
not possible to find the function "metric_func"
Maybe I am missing something, but cutpointr do not have the function get_opt_ind (from cutpointr:::get_opt_ind)
Thank your for your help.
I see. This seems to be caused by tibble
. I was using the dev version from Github. Can you try this modified function? That should hopefully work now (at least with tibble v1.4.2 and cutpointr v0.7.3):
add_metric <- function(oc, metric) {
if (!is.list(metric)) stop("The metric function(s) must be given as a list.")
require(rlang)
met <- purrr::map(metric, function(metric_func) {
if (!is.function(metric_func)) {
stop("The list elements of metric have to be functions.")
}
# Get numbers of TP, FP, TN, FN at optimal cutpoint(s) from ROC curve(s)
# and calculate the metric(s). Allow for type instability of the metric func:
purrr::pmap_df(list(oc$optimal_cutpoint, oc$roc_curve, oc$direction),
function(optimal_cutpoint, roc_curve, direction) {
opt_ind <- cutpointr:::get_opt_ind(roc_curve = roc_curve,
oc = optimal_cutpoint,
direction = direction)
met <- metric_func(tp = roc_curve$tp[opt_ind],
fp = roc_curve$fp[opt_ind],
tn = roc_curve$tn[opt_ind],
fn = roc_curve$fn[opt_ind])
met_name <- colnames(met)
if (length(met_name) > 1) {
stop("The metric function should return one column or a vector.")
}
if (is.null(met_name)) met_name <- "added_metric"
if (length(met) > 1) {
met <- list(as.numeric(unlist(met)))
} else {
met <- as.numeric(met)
}
tibble::tibble(!!met_name := met)
})
})
oc <- dplyr::bind_cols(oc, met)
return(oc)
}
cutpointr:::get_opt_ind
is a function for returning the row of the ROC curve that contains the optimal cutpoint. It is not exported, so you need the :::
.
The functions in metric
of add_metric
have to be actual functions, not character, as the function tries to apply them to the numbers of TP, FP, TN, and FN. That's consistent with the behavior of cutpointr()
. I added a small check.
That worked! Thank you very much! I think it is a good addition to future versions!
is it possible to report more than one metric function on the columns of the result of cutpointr? Like PPV, NPV, Accuracy? thank you