`$evaluation` results from `rf_evaluate()` sensitive to order of character vector passed to `metrics=` argument

mikoontz commented 2 years ago

tl;dr

Incorrect values of evaluation metrics can be given depending on the order of the character vector passed as the metrics= argument to rf_evaluate(). Happens because of a forced renaming of columns based on a hard-coded order. Can be fixed with a small patch (let me know if you want a PR!)

Some details

I came across this issue while working to spatially cross-validate a non-spatial model with a binomial response. I was using AUC and R^2 as my evaluation metrics, and wanted to add RMSE. I did so, then my AUC metric was appearing in the row labeled RMSE and vice versa. At first I was shocked at how low my AUC was! Then I realized I was working with the same data (and same random seed) so something else must have been going on.

In looking closely at the model object updated with the $evaluation data, I see that the model$evaluation$per.fold data.frame appears to be correct, but that the model$evaluation$per.fold.long and model$evaluation$per.model appear to be wrong. The rows in model$evaluation$per.fold.long and model$evaluation$per.model that correspond to the "full" model also look to be correct, with just the rows representing "training" and "testing" having the metrics and values mislabeled.

All of this suggest to me that, in the rf_evaluate() code, evaluation.df is correct, performance.full is correct, and there's an issue with performance.training and performance.testing.

I think the problem is in the renaming of the columns here: https://github.com/BlasBenito/spatialRF/blob/9bc7edd444a1dac0531da07791d0332db08f6450/R/rf_evaluate.R#L403-L406.

If the order of the metrics character vector happens to be the same order of the metrics in the columns of performance.testing and performance.training, then there is no problem. But if the order is different in metrics, then the assignment coerces the column names to the order provided by the metrics= argument (c(metrics, "model")).

Possible fix This approach appears to work: remove the current approach to defining the colnames() and instead substitute the .training part of the column names in performance.training or the .testing part of the column name in performance.testing to be blank, then reassign the performance.training and performance.testing objects as you do with performance.full. That is:

performance.training <- dplyr::select(
  evaluation.df,
  dplyr::contains("training")
)
performance.training[, 1] <- NULL
colnames(performance.training) <- gsub(pattern = ".*\\.", replacement = "", x = colnames(performance.training))
performance.training$model <- "Training"
performance.training <- performance.training[, c(metrics, "model")]

performance.testing <- dplyr::select(
  evaluation.df,
  dplyr::contains("testing")
)
performance.testing[, 1] <- NULL
colnames(performance.testing) <- gsub(pattern = ".*\\.", replacement = "", x = colnames(performance.testing))
performance.testing$model <- "Testing"
performance.testing <- performance.testing[, c(metrics, "model")]

Minimum working example

This initial block is copied exactly from your excellent tutorial:

library(spatialRF)
library(dplyr)

#loading training data and distance matrix from the package
data(plant_richness_df)
data(distance_matrix)

#names of the response variable and the predictors
dependent.variable.name <- "richness_species_vascular"
predictor.variable.names <- colnames(plant_richness_df)[5:21]

#coordinates of the cases
xy <- plant_richness_df[, c("x", "y")]

#distance matrix
distance.matrix <- distance_matrix

#distance thresholds (same units as distance_matrix)
distance.thresholds <- c(0, 1000, 2000, 4000, 8000)

#random seed for reproducibility
random.seed <- 1

plant_richness_df$response_binomial <- ifelse(
  plant_richness_df$richness_species_vascular > 5000,
  1,
  0
)

model.non.spatial <- spatialRF::rf(
  data = plant_richness_df,
  dependent.variable.name = "response_binomial",
  predictor.variable.names = predictor.variable.names,
  distance.matrix = distance.matrix,
  distance.thresholds = distance.thresholds,
  seed = random.seed,
  verbose = FALSE
)

The next block shows the issue by running rf_evaluate() in two different ways, with only the order of the metrics= argument switched.

model.non.spatial.2 <- model.non.spatial

model.non.spatial <- spatialRF::rf_evaluate(
  model.non.spatial,
  xy = xy,
  metrics = c("auc", "rmse"),
  verbose = FALSE
)

model.non.spatial.2 <- spatialRF::rf_evaluate(
  model.non.spatial.2,
  xy = xy,
  metrics = c("rmse", "auc"),
  verbose = FALSE
)

The output from print_evaluation() shows the problem. The values for AUC are switched with the values from RMSE!

> spatialRF::print_evaluation(model.non.spatial)

Spatial evaluation 
  - Training fraction:             0.75
  - Spatial folds:                 29

 Metric Median   MAD Minimum Maximum
    auc  0.378 0.046   0.288   0.504
   rmse  0.882 0.044   0.708   0.960

> spatialRF::print_evaluation(model.non.spatial.2)

Spatial evaluation 
  - Training fraction:             0.75
  - Spatial folds:                 29

 Metric Median   MAD Minimum Maximum
    auc  0.882 0.044   0.708   0.960
   rmse  0.378 0.046   0.288   0.504

Troubleshooting code from rf_evaluate() that diagnosed problem to the colnames() assignment

Continue this block using the code from the minimum working example above.

evaluation.df <- model.non.spatial$evaluation$per.fold
metrics <- c("auc", "rmse") # The order I specified in the "bad" rf_evaluate() call

performance.training <- dplyr::select(
  evaluation.df,
  dplyr::contains("training")
)
performance.training[, 1] <- NULL
performance.training$model <- "Training"

performance.testing <- dplyr::select(
  evaluation.df,
  dplyr::contains("testing")
)
performance.testing[, 1] <- NULL
performance.testing$model <- "Testing"

r.squared <- model.non.spatial$performance$r.squared
pseudo.r.squared <- model.non.spatial$performance$pseudo.r.squared
rmse <- model.non.spatial$performance$rmse
nrmse <- model.non.spatial$performance$nrmse
auc <- model.non.spatial$performance$auc

#check lengths
if(length(r.squared) == 0){
  r.squared <- NA
}
if(length(pseudo.r.squared) == 0){
  pseudo.r.squared <- NA
}
if(length(rmse) == 0){
  rmse <- NA
}
if(length(nrmse) == 0){
  nrmse <- NA
}
if(length(auc) == 0){
  auc <- NA
}

#full model
performance.full <- data.frame(
  r.squared = r.squared,
  pseudo.r.squared = pseudo.r.squared,
  rmse = rmse,
  nrmse = nrmse,
  auc = auc,
  model = "Full"
)
performance.full <- performance.full[, c(metrics, "model")]

colnames(performance.training) <- colnames(performance.testing) <- colnames(performance.full) <- c(
  metrics,
  "model"
)

# metrics are now mislabeled
performance.training
performance.testing

BlasBenito commented 2 years ago

Hi,

Thank you very much for pointing out this critical problem!

However, I noticed it during my work on the package during my holidays and submitted a fix to the development version a few weeks ago. I have been re-testing it today just to be sure, and it it seems to be working as expected.

If you are going to try it, please keep in mind that the development version is still a work in progress, and some things are surely broken.

Thanks again for keeping those issues coming. I truly appreciate it!

Cheers,

Blas

[edit]: I just submitted a new version of rf_evaluate() fixing an issue that was too related to the order of the metrics!

mikoontz commented 2 years ago

Great news! I think we can safely close this issue then.

BlasBenito / spatialRF

`$evaluation` results from `rf_evaluate()` sensitive to order of character vector passed to `metrics=` argument #11