AlineTalhouk / splendid

Supervised Learning Ensemble for Diagnostic Identification
https://alinetalhouk.github.io/splendid/
Other
1 stars 0 forks source link

Inconsistent prediction output #43

Closed Dustin21 closed 6 years ago

Dustin21 commented 6 years ago

@dchiu911 There seems to be some inconsistencies in the splendid::prediction() output of different algorithms. For instance, the following will return a factor vector for algorithm adaboost, but a character vector for mlr_lasso. This is resulting in some needless post-processing that I think could be avoided. Any reason why this could be or if it could be corrected?

data(hgsc)
class <- factor(attr(hgsc, "class.true"))
set.seed(1)
training.id <- sample(seq_along(class), replace = TRUE)
test.id <- which(!seq_along(class) %in% training.id)

# adaboost
mod <- classification(hgsc[training.id, ], class[training.id], "adaboost")
pred <- prediction(mod, hgsc, class, test.id)
is.factor(pred) # returns TRUE

# mlr_lasso
mod <- classification(hgsc[training.id, ], class[training.id], "mlr_lasso")
pred <- prediction(mod, hgsc, class, test.id)
is.factor(pred) # returns FALSE
dchiu911 commented 6 years ago

I did not realize that the glmnet::predict.glmnet(..., type = "class") produced a one column matrix of the response labels, instead of an atomic vector, thanks for the catch.