Open chrisajohnson opened 23 hours ago
Hello Chris :wave:
Thank you for your detailed question.
We are absolutely right : cutoff
and sensitivity
and specifity
values are returned for the calibration dataset, and for validation dataset, we return only the value of the metric computed over data transformed with the cutoff found over the calibration part.
But there is actually a way, not direct, but quite understandable I think, for you to obtain sensitivity and specificity for your validation dataset ! Let's see that together :eyes:
I'm using the biomod2 example.
library(biomod2)
library(terra)
# Load species occurrences (6 species available)
data("DataSpecies")
head(DataSpecies)
# Select the name of the studied species
myRespName <- 'GuloGulo'
# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data("bioclim_current")
myExpl <- rast(bioclim_current)
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
expl.var = myExpl,
resp.xy = myRespXY,
resp.name = myRespName)
# Model single models
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'AllModels',
CV.strategy = 'random',
CV.nb.rep = 2,
CV.perc = 0.8,
OPT.strategy = 'bigboss',
var.import = 3,
metric.eval = c('TSS','ROC'))
And then, I'm going to use several getters (get_species_data
, get_calib_lines
, get_predictions
) and one secundary function (bm_FindOptimStat). This last one is the function we use in biomod2 code to compute evaluation metrics.
# Get observed data, and corresponding calibration lines, and predictions
# These 3 tables should have the same number of rows, as they are matching points
tab = get_species_data(myBiomodData)
calib = get_calib_lines(myBiomodModelOut, as.data.frame = TRUE)
predi = get_predictions(myBiomodModelOut, model.as.col = TRUE)
# Get evaluation data
myEval = get_evaluations(myBiomodModelOut)
# Here I selected one single model for the example
# And I extract the corresponding evaluation line
mod = "GuloGulo_allData_RUN1_GLM"
ref = myEval[which(myEval$full.name == mod & myEval$metric.eval == "TSS"), ]
ref
# Then I'm using the bm_FindOptimStat function directly with my observed and predicted data
# selecting either the points used for calibration or validation
# The 2 TSS values obtained should match the ones contained in the ref object
ind = which(calib[, 1] == TRUE)
bm_FindOptimStat(metric.eval = "TSS", obs = tab[ind, 1], fit = predi[ind, mod])
ind = which(calib[, 1] == FALSE)
bm_FindOptimStat(metric.eval = "TSS", obs = tab[ind, 1], fit = predi[ind, mod], threshold = ref$cutoff)
Do not hesitate if something is not clear ! Maya
Excellent, that worked like a charm! Thanks for the prompt reply and the help.
Best, Chris
Context and question How do I calculate sensitivity and specificity based on the validation data for single models? After running BIOMOD_Modeling, I get the evaluations using get_evaluations with metric.eval = "TSS", which shows columns for sensitivity, specificity, calibration, and evaluation. The sensitivity and specificity columns are presumably for the calibration data because (sensitivity + specificity)/100 - 1 yields the value in the calibration column. The validation column presumably shows the TSS for the validation data, but not the underlying sensitivity and specificity metrics. Is it possible to get biomod2 to report these metrics?
Alternatively, I could calculate sensitivity and specificity based on the confusion matrix using the predicted values from my validation set, but I am not sure how to do this. For reference, I used simple split sampling by setting CV.nb.rep = 100 and CV.perc = 0.8 in BIOMOD_Modeling. When I use get_predictions, I see the predictions for all rows of my input data frame, but it's not clear to me which rows were used for validation for a given model run. Also, the predictions are numbers in the hundreds, presumably this is to save memory and I can simply divide by 1000 to get the occurrence probability? I certainly hope that there is an easier way to extract sensitivity and specificity for the validation data because this method seems difficult to implement given my lack of knowledge about how split sampling works "under the hood" in biomod2.
Thanks!