Open yuliaUU opened 4 days ago
Hello Yulia,
Thank you for your detailed question. It is indeed a case that is not often asked, so I'll try and suggest you a way to make it work 🙂
I would say that the simplest way to do that, is to split your dataset in 3 parts (that I will call after tuning, modeling and evaluation), before giving it to any biomod2 functions, and then to plan your modelling scheme in two steps :
BIOMOD.formated.data
only with your tuning data, use this object to tune your models and retrieve the modeling options obtainedBIOMOD.formated.data
object, giving it on one side the data for modeling, and on the other side the data for evaluating, and use the user.defined
strategy to define modeling options.⚠️ Please consider that it means :
⚠️ So I would advise to give another thorough thought to be sure you understand all your simulation choices !
If it is still something you consider to do, here is in practice how it could be done :
## ------------- TUNING ------------- ##
myDataTun<- BIOMOD_FormatingData(
resp.name = "name",
resp.var = myResp.tun,
resp.xy = myRespXY.tun,
expl.var = myExpl,
PA.nb.rep = 1,
PA.nb.absences = 1000,
PA.strategy = 'random',
filter.raster = TRUE,
dir.name = 'SDM'
)
# Tune models
optTun <- bm_ModelingOptions(
data.type = "binary",
models = c("RF", "MAXENT", "XGBOOST"),
strategy = "tuned",
bm.format = myDataTun
)
# Get other models options
optBB <- bm_ModelingOptions(
data.type = "binary",
models = c("GLM", "GAM.mgcv.gam", "MAXNET"),
strategy = "bigboss",
bm.format = myDataTun
)
# Gather options
user.val <- c(optTun@options, optBB@options)
## ------------- MODELING ------------- ##
myDataMod<- BIOMOD_FormatingData(
resp.name = "name",
resp.var = myResp.mod,
resp.xy = myRespXY.mod,
expl.var = myExpl,
eval.resp.var = myResp.eval,
eval.resp.xy = myRespXY.eval,
eval.expl.var = myExpl,
PA.nb.rep = 1,
PA.nb.absences = 1000,
PA.strategy = 'random',
filter.raster = TRUE,
dir.name = 'SDM'
)
# Define the native models list
NATIVERG <- c("GLM", "GAM", "RF", "MAXENT", "MAXNET", "XGBOOST")
# Individual Model Creation
myBiomodModelOut <- BIOMOD_Modeling(
bm.format = myDataMod,
modeling.id = "TST",
models = NATIVERG,
CV.strategy = 'random',
CV.nb.rep = 3,
CV.perc = 0.7,
CV.do.full.models = FALSE,
OPT.data.type = 'binary',
OPT.strategy = 'user.defined',
OPT.user.val = user.val,
metric.eval = c('TSS', 'ROC'),
var.import = 0,
do.progress = FALSE,
nb.cpu = 2
)
EVALMODEL <- get_evaluations(myBiomodModelOut)
But once again, I advise to carefully think of your modeling plan before đź‘€
As for the fact that you still get allData_allRun
dataset in your evaluation table, it is because do.full.models
parameter within the BIOMOD_Modeling function is now called CV.do.full.models`.
Hope it helps, Maya
Hi Maya, thanks a lot for detailed explanations! and thank you for showing optBB
part: solved the issue I have been having
Regarding your warning "pseudo-absences will be different between tuning and modeling datasets, and cross-validation selection as well": why will it be an issue? the data used for tunning are not associated with the modeling data, and i want to have those datasets "independent" of each other, so my hyperparameter tunning is not biased. Or am I missing smth ?
thank you a lot for all your help!
Glad that it helped !
I was mentioning it as you tune 3 models (RF
, MAXENT
, XGBOOST
) over your data.
Meaning that you will have a specific set of parameter for each PA x CV combination.
What do you intend to do with it afterwards ?
Let's say you have 2 PA datasets and 3 CV datasets, leading to 2 * 3 = 6 datasets in total.
They have been selected over the tuning part of your data.
You will have specific parameter values for PA1_RUN1
, that might differ from PA1_RUN2
.
Then you want to give these values to your modeling part of your data, but for which selection of PA and CV will be different. Meaning that you are matching parameter values calibrated over a selection of background points and data that are different than the one you will use to run your models....
:information_source: Note that in my code example, I removed the CV part. It is possible to make the selection of pseudo-absences and keeping it through your tuning and modeling datasets (:warning: which I haven't done here !) but this gets complicated for CV as you are changing your observation data....
Do you see what I mean ?
Maya
yes, yes i see the issue. I am doing 1 PA set and no CV. I guess what i was thinking to achieve is that i do hyperparameter tunning on a separate dataset, so i get unbiased estimates, and then apply those tuned hyperparameters to my actual models to do calibration/validation
Hi
I have more theoretical question. Ideally what I would like to do is to split data in 3 parts: one used for tuning, other for testing and last one for evaluation. to reproduce:
based on all the documentations and issues posted by others, I so far figured the tunning part (but still not sure how can I ensure that data I used in tunning is not part of the modelling process). SO for now i am just trying to make it work on the same data:
bm_Tuning
andbm_ModelingOptions
functions_allData
andall_Runs
: I don't want those in my models. even though i setdo.full.models=FALSE,
my EVALMODEL table still has the rows for themthank you a lot for your help!