JackStat / ModelMetrics

Rapid Calculation of Model Metrics
29 stars 9 forks source link

Suggestion for improvement of the auc function #14

Closed studerus closed 7 years ago

studerus commented 7 years ago

I noticed that your version of auc is still about one third slower than the one I recently implemented in mlr. One bottleneck seems to be this line:

if(class(actual) %in% c('factor', 'character')){
   actual = as.numeric(as.factor(as.character(actual))) - 1
 }

I suggest to replace it with

if (inherits(actual, 'factor')) {
    actual <- as.integer(actual) - 1L
  } else if (inherits(actual, 'character')) {
    actual <- as.integer(as.factor(actual)) - 1L
  }

See the following benchmark:

library(ModelMetrics)
library(mlr)
library(microbenchmark)
library(data.table)
x <- c('Pos', 'Neg')
actual <- sample(factor(x, x), 50000, replace = T)
predicted <- runif(length(actual))

auc3_ <- ModelMetrics:::auc3_
binaryChecks <- ModelMetrics:::binaryChecks
auc2 <- function(actual, predicted, ...) {
  binaryChecks(actual, 'auc')
  if (inherits(actual, 'factor')) {
    actual <- as.integer(actual) - 1L
  } else if (inherits(actual, 'character')) {
    actual <- as.integer(as.factor(actual)) - 1L
  }

  if(length(actual > 10000)){
    ranks = frankv(predicted)
    AUC <- ModelMetrics:::auc3_(actual, predicted, ranks)
  } else {
    AUC <- auc_(actual, predicted, ranks)
  }
  return(AUC)
}

microbenchmark(mlr = measureAUC(predicted, actual, positive = 'Pos'),
               modelmetrics = auc(actual, predicted),
               modelmetrics.improved = auc2(actual, predicted))
Unit: milliseconds
                  expr      min       lq     mean   median       uq      max neval cld
                   mlr 4.148332 4.236479 4.454990 4.305307 4.445979 6.128918   100 a  
          modelmetrics 6.395471 6.496297 6.840453 6.631536 6.790019 9.250582   100   c
 modelmetrics.improved 4.480996 4.534126 4.836400 4.623480 4.722645 6.802999   100  b 
JackStat commented 7 years ago

Thank you!