AlineTalhouk / splendid

Supervised Learning Ensemble for Diagnostic Identification
https://alinetalhouk.github.io/splendid/
Other
1 stars 0 forks source link

Modify evaluation object output #2

Closed dchiu911 closed 7 years ago

dchiu911 commented 7 years ago
AlineTalhouk commented 7 years ago

@dchiu911 can you please add MCC (Matthews correlation coefficient) as a measure of accuracy? it is generated from the CM

dchiu911 commented 7 years ago

I followed Equation 8 from http://www.sciencedirect.com/science/article/pii/S1476927104000799 for multiclass situation. Please verify

mcc <- function(C) {
  N <- sum(C)
  Ct <- t(C)
  rc <- cross2(seq_len(nrow(C)), seq_len(nrow(C)))
  num <- N * sum(diag(C)) - sum(map_dbl(rc, ~ C[.x[[1]], ] %*% C[, .x[[2]]]))
  den <- sqrt(N ^ 2 - sum(map_dbl(rc, ~ C[.x[[1]], ] %*% Ct[, .x[[2]]]))) * 
    sqrt(N ^ 2 - sum(map_dbl(rc, ~ Ct[.x[[1]], ] %*% C[, .x[[2]]])))
  return(num / den)
}

data(hgsc)
class <- stringr::str_split_fixed(rownames(hgsc), "_", n = 2)[, 2]
set.seed(1)
training.id <- sample(seq_along(class), replace = TRUE)
test.id <- which(!seq_along(class) %in% training.id)
mod <- classification(hgsc[training.id, ], class[training.id], "lda")
pred <- prediction(mod, hgsc, test.id)
mcc(table(true = class[test.id], pred))