Getting different f1-score result

Tazinho commented 6 years ago

Hi,

I am using your package to check my quick implementation of the dice/f1-score.

When I use the cutpointr() function and put the displayed optimal threshold (0.007) into my dice function I am getting a different result (0.6586022) than the cutpointr() function displays (0.672).

Do you work with approximations or is this a bug in my or your code?

# libraries
library(magrittr)
library(tibble)
#> Warning: Paket 'tibble' wurde unter R Version 3.5.1 erstellt
library(cutpointr)
#> Warning: Paket 'cutpointr' wurde unter R Version 3.5.1 erstellt
# testdata
set.seed(123)
df_test <- tibble(pred = runif(1000,0,1),
                  resp = sample(c(0,1), size = 1000, replace = TRUE))
# functions
binary_outcome <- function(x, thr){

  x[x >= thr] <- 1L
  x[x  < thr] <- 0L
  x
}
true_positive <- function(bo, label){
  bo <- as.integer(bo); label <- as.integer(label)
  sum(bo == 1L & label == 1L)
}
false_positive <- function(bo, label){
  bo <- as.integer(bo); label <- as.integer(label)
  sum(bo == 1L & label == 0L)
}
true_negative <- function(bo, label){
  bo <- as.integer(bo); label <- as.integer(label)
  sum(bo == 0L & label == 0L)
}
false_negative <- function(bo, label){
  bo <- as.integer(bo); label <- as.integer(label)
  sum(bo == 0L & label == 1L)
}
dice <- function(pred, label, thr){
  label <- as.integer(label)
  bo <- binary_outcome(pred, thr)

  tp <- true_positive(bo, label)
  fp <- false_positive(bo, label)
  tn <- true_negative(bo, label)
  fn <- false_negative(bo, label)

  2 * tp /(2 * tp + fn + fp)
}

# Test
cp <- cutpointr(data = df_test, x = pred, class = resp, 
                method = maximize_metric, metric = F1_score)
#> Assuming the positive class is 0
#> Assuming the positive class has higher x values
cp
#> # A tibble: 1 x 16
#>   direction optimal_cutpoint method          F1_score   acc sensitivity
#>   <chr>                <dbl> <chr>              <dbl> <dbl>       <dbl>
#> 1 >=                 0.00700 maximize_metric    0.672 0.509       0.998
#>   specificity   AUC pos_class neg_class prevalence outcome predictor
#>         <dbl> <dbl>     <dbl>     <dbl>      <dbl> <chr>   <chr>    
#> 1      0.0141 0.521         0         1      0.503 resp    pred     
#>   data                 roc_curve                 boot 
#>   <list>               <list>                    <lgl>
#> 1 <tibble [1,000 x 2]> <data.frame [1,001 x 10]> NA
dice(pred = df_test$pred, label = df_test$resp, thr = 0.00700)
#> [1] 0.6586022

# including the precise cutpoint
dice(pred = df_test$pred, label = df_test$resp, thr = cp$optimal_cutpoint)
#> [1] 0.6581598

Created on 2018-07-20 by the reprex package (v0.2.0).

Thie1e commented 6 years ago

Hi, thanks for the report. We have a unit test that checks if the F1-scores from cutpointr are identical to the ones returned by ROCR, so I hope the results from cutpointr are correct.

I'll look into it and will try to check why the difference occurs.

If you have a look at cutpointr::F1_score you'll see that the code is basically identical to yours:

function(tp, fp, tn, fn, ...) {
    f <- (2 * tp) / (2 * tp + fp + fn)
    f <- matrix(f, ncol = 1)
    colnames(f) <- "F1_score"
    return(f)
}

Thie1e commented 6 years ago

Upon reviewing the issue, I'm noticing that you're hardcoding the positive class to be 1, the negative class to be 0 and direction = ">=", so that values larger than the threshold indicate the positive class. As the output from cutpointr() states (also via the messages #> Assuming the positive class is 0 and #> Assuming the positive class has higher x values), it assumes that the positive class is 0 because it is the class with the higher median value for df_test$resp.

If you simply specify classes and direction as follows, you'll get an F1-score that is identical to the one from your function:

> cp <- cutpointr(data = df_test, x = pred, class = resp, 
+                 method = maximize_metric, metric = F1_score,
+                 pos_class = 1, direction = ">=")
> cp %>% select(optimal_cutpoint, F1_score)
# A tibble: 1 x 2
  optimal_cutpoint F1_score
             <dbl>    <dbl>
1      0.000465349 0.663995
> dice(pred = df_test$pred, label = df_test$resp, thr = cp$optimal_cutpoint)
[1] 0.6639947

Tazinho commented 6 years ago

I guess I was somehow confused ;-) Thanks a lot for taking the time to review this and also for pointing out the reasonable messages. Indeed super clear.

Thie1e / cutpointr

Getting different f1-score result #6