Closed LorenzoBernicchi closed 3 months ago
Hello Lorenzo,
How can I both change some model parameters using the list as in
BigBoss
function and the tuning function?
You can check the Modeling options vignette made by Hélène : the User defined section presents you exactly what you want I think :slightly_smiling_face:
bigboss
options as base, through bm_ModelingOptions function (or directly within BIOMOD_Modeling) and strategy = 'user.defined'
In your case, as you want also to modify some parameters of MAXENT, you can do that between steps 1 and 2, by adding your user.MAXENT
element to the list of options to give to user.val
parameter.
How can I create different set with different number of pseudoabsences, and to give a specific set to a specific algorithm?
You did a great job so far within the BIOMOD_FormatingData function !
Now, the last step is to use the models.pa
parameter within the BIOMOD_Modeling function.
So in your case, you created 3 sets of pseudo-absences, and you want to attribute them to each algorithm, and here is a random example :
models.pa <- list(RF = "PA1"
, GLM = c("PA2", "PA3")
, GAM= c("PA2", "PA3")
, GBM = "PA1"
, MAXENT = "PA3"
, MAXNET = "PA3")
Please, do not hesitate if you need more details :slightly_smiling_face:
Maya
PS : this issue post was really perfect and well organized :star_struck: :heart:
Dear @MayaGueguen , Thanks for your answer, that's really helpful!
I updated my script to change some parameter of other models and to assign different sets of PAs to different algorithms.
At the end of this issue, you can find the updated version of my script.
However, when I run the BIOMOD_EnsembleModeling
function I get the following error:
Evaluating Model stuff...obs and fit are not the same length => model evaluation skipped !obs and fit are not the same length => model evaluation skipped !
Errore in { :
task 1 failed - "task 1 failed - "task 1 failed - "argomento di lunghezza 0"""
In aggiunta: Messaggi di avvertimento:
1: In BIOMOD_EnsembleModeling(bm.mod = Capreolus_single_models, models.chosen = "all", :
Parallelisation with `foreach` is not available for Windows. Sorry.
2: In cross.validation$validation <- NA :
Trasformo il membro di sinistra in una lista
Selected_algos <- c("GAM", "GBM", "GLM",
"MAXENT", "MAXNET", "RF", "XGBOOST")
myBiomodData <- BIOMOD_FormatingData(
resp.name = "Capriolo_COMBO_",
resp.var = Capriolo_points,
expl.var = Variables,
PA.nb.rep = 3,
PA.nb.absences = c(terra::nrow(Capriolo_points), 3*terra::nrow(Capriolo_points), 10000),
PA.strategy = "random",
filter.raster = T
)
myBiomodData
myBiomodData@PA.table
show(Variables)
# Selected_algos <- "MAXENT"
user.RF <- list('_allData_allRun' = list(
mtry = "default",
ntree = 1000,
nodesize = 10,
maxnodes = 5
))
user.MAXENT <- list('_allData_allRun' = list(
path_to_maxent.jar = getwd()
))
user.GAM <- list('_allData_allRun' = list(
algo = "GAM_mgcv"
))
OptionsBigboss
ModelsTable
user.val <- list(MAXENT.binary.MAXENT.MAXENT = user.MAXENT,
RF.binary.randomForest.randomForest = user.RF,
GAM.binary.mgcv.gam = user.GAM)
myOptions <- bm_ModelingOptions(data.type = 'binary',
models = Selected_algos,
strategy = 'user.defined',
user.val = user.val,
user.base = 'bigboss')
myOptions
# biomod2::plot(BIOMOD_data)
Capreolus_single_models <- BIOMOD_Modeling(
bm.format = myBiomodData,
modeling.id = "Single.models",
models = Selected_algos,
models.pa = list(
GAM = c("PA2","PA3"),
GLM = c("PA2","PA3"),
GBM = c("PA1","PA2"),
RF = c("PA1","PA2"),
MAXENT = c("PA2","PA3"),
MAXNET = c("PA2","PA3"),
XGBOOST = c("PA2","PA3")),
CV.strategy = "block",
# CV.nb.rep = 10,
# CV.perc = 0.65,
CV.do.full.models = F,
bm.options = myOptions,
metric.eval = c("ROC", "TSS"),
var.import = 1,
nb.cpu = 4,
do.progress = T
)
Capreolus_ensemble_models <- BIOMOD_EnsembleModeling(
bm.mod = Capreolus_single_models,
models.chosen = "all",
em.by = "all",
em.algo = "EMwmean",
metric.select = c("ROC"),
metric.select.thresh = c(0.7),
metric.eval = c("ROC", "TSS"),
var.import = 3,
nb.cpu = 432,
do.progress = T
)
Glad that you are coming to the next step !
I'm going to need a bit more insight for this one. Could you send me also all the prints you get when running the BIOMOD_EnsembleModeling function please ? (all text that is printed into your R command) :eyes:
Here you can find the prints I get when running BIOMOD_EnsembleModeling
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
! all models available will be included in ensemble.modeling
! Ensemble Models will be filtered and/or weighted using validation dataset (if possible). Please use `metric.select.dataset` for alternative options.
> Evaluation & Weighting methods summary :
ROC over 0.7
> mergedData_mergedRun_mergedAlgo ensemble modeling
! Additional projection required for ensemble models merging several pseudo-absence dataset...
original models scores = 0.714 0.71
final models weights = 0.501 0.499
> Probabilities weighting mean by ROC ...
Evaluating Model stuff...obs and fit are not the same length => model evaluation skipped !obs and fit are not the same length => model evaluation skipped !Errore in { :
task 1 failed - "task 1 failed - "task 1 failed - "argomento di lunghezza 0"""
In aggiunta: Messaggi di avvertimento:
1: In BIOMOD_EnsembleModeling(bm.mod = Capreolus_single_models, models.chosen = "all", :
Parallelisation with `foreach` is not available for Windows. Sorry.
2: In cross.validation$validation <- NA :
Trasformo il membro di sinistra in una lista
Thank you for the prints !
Would you mind sending me the following objects (Capriolo_points
, Variables
) so I can replicate the issue ? :pray:
Either here or by email to maya.gueguen [at] univ-grenoble-alpes.fr
Hi @MayaGueguen ,
I just sent you the two files you asked me. Unfortunately they are in separate e-mails (I forgot to attach a file in the first e-mail, sorry).. Thank you in advance for all your help, have a nice day!
Lorenzo Bernicchi
Thank you Lorenzo for the data ! :pray: It helped :slightly_smiling_face:
I just pushed a commit that should correct the problem. Please have a try, and do not hesitate if you encounter new issues !
Maya
Hello @MayaGueguen , thanks for correcting the problem, now it works fine!
However, I have another questions related to algorithms tuning.
I want to tune the algorithms I will use, and I followed the format you suggested in the issue #415.
I produce the following script. May I ask you if there are any problems or potential issues? I noticed that it takes a lot of time to run each bm_Tuning
function (even hours for each algorithms, is it normal?), so I would like to know if I will face any problem before waiting for a long time.
Here the script:
Selected_algos <- c("GAM", "GBM", "GLM",
"MAXENT", "MAXNET", "RF", "XGBOOST")
myBiomodData <- BIOMOD_FormatingData(
resp.name = "Capriolo",
resp.var = Capriolo_points,
expl.var = Variables,
PA.nb.rep = 12,
PA.nb.absences = c(
rep(terra::nrow(Capriolo_points),4),
rep(3*terra::nrow(Capriolo_points),4),
rep(10000, 4)),
PA.strategy = "random",
filter.raster = T
)
block_CV <- bm_CrossValidation(
bm.format = myBiomodData,
strategy = "block")
default_options <- bm_ModelingOptions(data.type = 'binary',
models = Selected_algos,
strategy = 'bigboss',
bm.format = Capriolo_points,
calib.lines = block_CV)
tuned.RF <- bm_Tuning(model = "RF",
tuning.fun = "rf",
do.formula = T,
bm.options = default_options@options$RF.binary.randomForest.randomForest,
bm.format = myBiomodData,
calib.lines = block_CV[, grep(paste0("PA", 1:8, collapse = "|"), colnames(block_CV))])
for(n in names(tuned.RF)){
tuned.RF[[n]] <- c(tuned.RF[[n]], nodesize = 15)
}
for(n in names(tuned.RF)){
tuned.RF[[n]] <- c(tuned.RF[[n]], ntree = 10)
}
for(n in names(tuned.RF)){
tuned.RF[[n]] <- c(tuned.RF[[n]], maxnodes = 5)
}
tuned.GAM <- bm_Tuning(model = "GAM",
tuning.fun = "gam",
do.formula = T,
bm.options = default_options@options$GAM.binary.mgcv.gam,
bm.format = myBiomodData,
calib.lines = block_CV[, grep(paste0("PA", 5:12, collapse = "|"), colnames(block_CV))]
)
for(n in names(tuned.GAM)){
tuned.GAM[[n]] <- c(tuned.GAM[[n]], algo = "GAM_mgcv")
}
tuned.MAXENT <- bm_Tuning(model = "MAXENT",
tuning.fun = "ENMevaluate",
bm.options = default_options@options$MAXENT.binary.MAXENT.MAXENT,
bm.format = myBiomodData,
calib.lines = block_CV[, grep(paste0("PA", 5:12, collapse = "|"), colnames(block_CV))],
params.train = list(
tune.args=list(fc = c("L","LQ","LQH","H","LQHP","LQHPT"),
rm = seq(1,5,0.5)),
partitions = "randomkfold",
algorithm = "maxent.jar",
partition.settings = list(kfolds=10),
parallel = F
))
tuned.GBM <- bm_Tuning(model = "GBM",
tuning.fun = "gbm",
do.formula = T,
bm.options = default_options@options$GBM.binary.gbm.gbm,
bm.format = myBiomodData,
calib.lines = block_CV[, grep(paste0("PA", 1:8, collapse = "|"), colnames(block_CV))])
tuned.GLM <- bm_Tuning(model = "GLM",
tuning.fun = "glm",
do.formula = T,
bm.options = default_options@options$GLM.stats.glm.glm,
bm.format = myBiomodData,
calib.lines = block_CV[, grep(paste0("PA", 5:12, collapse = "|"), colnames(block_CV))])
tuned.MAXNET <- bm_Tuning(model = "MAXNET",
tuning.fun = "maxnet",
do.formula = T,
bm.options = default_options@options$MAXNET.binary.maxnet.maxnet,
bm.format = myBiomodData,
calib.lines = block_CV[, grep(paste0("PA", 5:12, collapse = "|"), colnames(block_CV))])
tuned.XGBOOST <- bm_Tuning(model = "XGBOOST",
tuning.fun = "xgbTree",
do.formula = T,
bm.options = default_options@options$XGBOOST.binary.xgboost.xgboost,
bm.format = myBiomodData,
calib.lines = block_CV[, grep(paste0("PA", 1:12, collapse = "|"), colnames(block_CV))])
user.val <- list(
RF.binary.randomForest.randomForest = tuned.RF,
MAXENT.binary.MAXENT.MAXENT = tuned.MAXENT,
GAM.binary.mgcv.gam = tuned.GAM,
GBM.binary.gbm.gbm = tuned.GBM,
GLM.stats.glm.glm = tuned.GLM,
MAXNET.binary.maxnet.maxnet = tuned.MAXNET,
XGBOOST.binary.xgboost.xgboost =tuned.XGBOOST
)
myOptions <- bm_ModelingOptions(
data.type = 'binary',
models = Selected_algos,
strategy = 'user.defined',
user.val = user.val,
user.base = 'bigboss',
bm.format = myBiomodData,
calib.lines = block_CV
)
myOptions
Capreolus_single_models <- BIOMOD_Modeling(
bm.format = myBiomodData,
modeling.id = "Single.models",
models = Selected_algos,
CV.strategy = "block",
# CV.nb.rep = 10,
# CV.perc = 0.65,
CV.do.full.models = F,
OPT.user = myOptions,
metric.eval = c("ROC", "TSS"),
var.import = 1,
nb.cpu = 4,
do.progress = T
)
Capreolus_ensemble_models <- BIOMOD_EnsembleModeling(
bm.mod = Capreolus_single_models,
models.chosen = "all",
em.by = "all",
em.algo = "EMwmean",
metric.select = c("ROC"),
metric.select.thresh = c(0.7),
metric.eval = c("ROC", "TSS"),
var.import = 3,
nb.cpu = 432,
do.progress = T
)
EDIT
I was too curios, so I tried the code anyway.
When MAXENT
tuning starts, I get this warning message
:
> Dataset _PA6_RUN2
> Tuning parameters...*** Running initial checks... ***
* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Model evaluations with random 10-fold cross validation...
*** Running ENMeval v2.0.4 with maxnet from maxnet package v0.1.4 ***
Hello Lorenzo,
Good to hear that the correction is working 👍
As for the tuning, unfortunately, it is long... And the more data and the more cross-validation sets, the longer. Even if some models are taking less time than other (GBM is very long for example), globally it is quite time consuming.
As for the warning for MAXENT, we are using the function ENMevaluate
from the ENMeval
package so I'm not a lot familiar with how it works.
But your code seems okay 🙂
Maya
Hi @MayaGueguen ,
Thanks for the info, I will try to have a look and see what will happen!
Just another quick question..
I would like to implement down-sampled RF (as in the issue #393). Are both my codes (with and without model tuning) doing the same thing by setting a PAs
dataset with many points as the presences? Or should I use something like
RF_param_list <-
list("_allData_allRun" =
list(ntree = 1000,
sampsize = c("NA" = 333,
"1" = 333),
replace = TRUE))
Using NA
since I will use PAs
Thank you very much, have a nice day!!
Hello Lorenzo :wave:
For the down-sampled RF question, you must put 0
even if you don't have real absences, it will take into account pseudo-absences.
Note that we will soon be ready to switch to version 4.2-6 on github which will contain a new single model named RFd
computing down-sampled RF without having to specify options for basic RF
like we do now :slightly_smiling_face:
Also, if you want to reduce a bit your computing time for tuning :
calib.lines
argument when calling to bm_Tuning, which leads to tuning all combination of pseudo-absence dataset and calibration setscalib.lines
, tuning will be made only for each of your pseudo-absence dataset (_PA1_allRun
, _PA2_allRun
, etc), and when giving it to bm_ModelingOptions, parameters tuned for a specific PA dataset (for example PA1
) will be attributed to all related calibration sets (_PA1_RUN1
, _PA1_RUN2
, etc)Maya
Hello @MayaGueguen ,
good to know for the 4.2-6
version, I will look forward for it!
About the suggestion you gave me:
should I drop this code line calib.lines = block_CV[, grep(paste0("PA", 1:8, collapse = "|"), colnames(block_CV))]
within each bm_Tuning
function? I don't know if I correctly understood your question!
I inserted this code line to specify which PAs
sets to use in each algorithm, but I could specify it also with the models.pa
parameter, am I wrong?
So, in the end, each of my bm_Tuning function will look like this:
tuned.RF <- bm_Tuning(model = "RF",
tuning.fun = "rf",
do.formula = T,
bm.options = default_options@options$RF.binary.randomForest.randomForest,
bm.format = myBiomodData,
params.train = list(
nodesize = 25,
ntree = 2500,
maxnodes = 10,
sampsize = c("1" = min(calib.summary$Presences),
"0" = min(calib.summary$Pseudo_Absences)) #not NA given what you specified above
))
while the BIOMOD_Modeling
will be like this to assign specific PA datasets to a specific agorithm:
Capreolus_single_models <- BIOMOD_Modeling(
bm.format = myBiomodData,
modeling.id = "Single.models",
models = Selected_algos,
models.pa = list(
RF = c("PA1","PA2")),
CV.strategy = "block",
# CV.nb.rep = 10,
# CV.perc = 0.65,
CV.do.full.models = F,
bm.options = myOptions,
metric.eval = c("ROC", "TSS"),
var.import = 1,
nb.cpu = 4,
do.progress = T
)
Please correct me if I am wrong. Thanks for now! Have a nice day
Sorry, I might have not been very clear.
Here is an example forRF
in your study case below :eyes:
Selected_algos <- c("GAM", "GBM", "GLM",
"MAXENT", "MAXNET", "RF", "XGBOOST")
myBiomodData <- BIOMOD_FormatingData(
resp.name = "Capriolo",
resp.var = Capriolo_points,
expl.var = Variables,
PA.nb.rep = 12,
PA.nb.absences = c(
rep(terra::nrow(Capriolo_points),4),
rep(3*terra::nrow(Capriolo_points),4),
rep(10000, 4)),
PA.strategy = "random",
filter.raster = T
)
calib.summary <- summary(myBiomodData) ## ADDED
block_CV <- bm_CrossValidation(
bm.format = myBiomodData,
strategy = "block")
default_options <- bm_ModelingOptions(data.type = 'binary',
models = Selected_algos,
strategy = 'bigboss',
bm.format = myBiomodData) ## CHANGED
tuned.RF <- bm_Tuning(model = "RF",
tuning.fun = "rf",
do.formula = TRUE,
bm.options = default_options@options$RF.binary.randomForest.randomForest,
bm.format = myBiomodData) ## REMOVED calib.lines
## CHANGED
for(n in names(tuned.RF)){
tuned.RF[[n]][["nodesize"]] = 25
tuned.RF[[n]][["ntree"]] = 2500
tuned.RF[[n]][["maxnodes"]] = 10
tuned.RF[[n]][["sampsize"]] = c("1" = min(calib.summary$Presences),
"0" = min(calib.summary$Pseudo_Absences)) #not NA given what you specified above
}
Dear @MayaGueguen ,
I tuned all my algorithms and everything went perfect. However, when I arrived at the BIOMOD_Projection()
function I get the following error:
Errore in .Call(list(name = "CppField__get", address = <pointer: (nil)>, :
Valore NULL passato come indirizzo simbolo
All my code and data remained the same as before.
Thanks again!
Hello Lorenzo,
Glad that tuning went well 👍
Could you check if everything is okay with your variables you give to BIOMOD_FormatingData or BIOMOD_Projection ?
It seems like it could be an error linked to terra
package or objects 👀
Maya
Hello again @MayaGueguen ,
the problem was with the variables
object, that I used to store all my raster variables.
I re-created it and everything went well! I think the problem was with the saving and uploading of the environment.
Thank you so much for everything, I really appreciate your help!
Have a nice day, all the best!
Context and question Hello
biomod2
team. I am setting the biomod data to use for modeling a species distribution, using the following algorithms: RF, GLM, GAM, GBM, MAXENT, MAXNET. I have set some model parameters using the lists, and I want also to tune some of them with the tuning function.I would like to ask you a couple of advices:
BigBoss
function and the tuning function?Code used
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.formated.data -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.models.options -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
sessionInfo()