Closed Canderson156 closed 1 year ago
After discovering this thread (https://github.com/topepo/caret/issues/224), it seems that a custom function is the only answer. Here is an example of a custom function that can address this issue:
library(PresenceAbsence)
kappa_custom <- function(data, lev = NULL, model = NULL) {
dt <- data[,c("rowIndex", "obs", "yes")] %>%
arrange(rowIndex) %>%
mutate(obs = ifelse(obs == "yes", TRUE, FALSE),
rowIndex = as.character(rowIndex))
ths <- optimal.thresholds(dt, opt.methods = "MaxKappa")
cmx_test <- cmx(dt, ths$yes[1])
k <- Kappa(cmx_test)
c(Kappa_Custom = k[,1])
}
FF2 <- ffs(predictors = test_data[,2:5],
response = test_data$presence,
trControl = trainControl(method = 'cv', number = 3, classProbs = TRUE,
savePredictions = TRUE, summaryFunction = kappa_custom),
minVar = 2,
method = 'glm',
family = 'binomial',
metric = "Kappa_Custom"
)
Hello,
I've been reading through the documentation for the ffs function, and I haven't been able to figure out a way to change what the threshold is for classifying predicted values. Am I missing something, or is this a missing feature? Perhaps the issue stems more from the caret package, but it is with the ffs function that I run into issues.
Here is an example of problem I'm having:
If I create a model using all 4 predictors variables, the Kappa statistic calculated by the train function is 0.08
However, this is based on a default threshold of 0.5. According the calculations below however, the threshold that would maximize Kappa is 0.2.
If I calculate Kappa using this threshold, I estimate a much higher Kappa statistic of 0.3.
This becomes an issues when I try to use the ffs function, because I am running into many instances where when I'm using Kappa as the metric for variable selection, all of the 2 and 3 variable combinations have a Kappa statistic of 0 or less, so the algorithm stops. When this happens, all of the predicted values are returned as "no", because the probability of "yes" is less than 0.5 for all of the observations. However, if the threshold had been 0.2 instead of 0.5, I suspect the Kappa value would have varied more between predictor combinations, and likely more variables would be selected.
Do you have any suggestions as to how I could customize the threshold used to calculate the classification metrics for ffs to avoid this issue?