Closed vanderleidebastiani closed 1 year ago
Hello @vanderleidebastiani
Thank you for using our new issue template and providing a well-formated issue :pray:
We have updated our management of pseudo-absences and cross-validation in our latest biomod2
release and now when using cross-validation with multiple pseudo-absences dataset you need to provide the cross-validation information (as TRUE
/FALSE
) for each one of your pseudo-absence (PA) dataset (more info here). And the column names must follow some rules as indicated by the error:
colnames(calib.lines) must follow the following format: '_PAx_RUNy' with x and y integer
Therefore:
data.frame
with 10x10 columns properly named:
_PA1_RUN1
, _PA1_RUN2
, ... , _PA1_RUN10
, _PA2_RUN1
, ...data.frame
with only 10 columns named:
_PA1_RUN1
, _PA2_RUN1
, ... , _PA10_RUN1
Here you have a data.frame
that have column names RUN1
, RUN2
, ... that does not match the new format we ask for.
I hope this is clearer. If not, feel free to ask additonal questions.
Best, Rémi
Hi @rpatin
Thanks for your help. I want to run with the first options (10 PA datasets and want 10 cross-validations). I tried to make use of the same spatial block with all pseudo-absences sets. The function passes across the previous error, however how all models fail (the models worked fine in biomod2 4.2-2). Am I making some formatting mistake in CV.user.table or would it be another problem?
I have an additional question unrelated. How to force the random forest function to perform classification or regression options?
Error
Model=Breiman and Cutler's random forests for classification and regression
RF modeling...Error in na.fail.default(structure(list(GuloGulo = structure(c(2L, 2L, :
missing values in object
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'predict': object 'model.bm' not found
head(CVTable)
_PA1_RUN1 _PA1_RUN2 ... _PA10_RUN9 _PA10_RUN10
[1,] TRUE TRUE ... TRUE FALSE
[2,] TRUE TRUE ... TRUE TRUE
[3,] TRUE TRUE ... TRUE TRUE
[4,] TRUE TRUE ... TRUE TRUE
Code used to get the error (continuation)
nK <- 10
nPseudo <- 10
CVTable <- matrix(NA, nrow = nrow(spatialBlocks$biomod_table), ncol = nK*nPseudo)
colnames(CVTable) <- paste0("x", seq_len(nK*nPseudo))
for(i in 1:nPseudo){
CVTable[,c(i*nK-(nK-1)):c(i*nK)] <- spatialBlocks$biomod_table
colnames(CVTable)[c(i*nK-(nK-1)):c(i*nK)] <- paste0("_PA",i, "_RUN", seq_len(nK))
}
head(CVTable)
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'AllModels',
models = c('RF'),
bm.options = myBiomodOptions,
CV.strategy = "user.defined",
CV.user.table = CVTable,
CV.do.full.models = FALSE,
metric.eval = c('TSS','ROC'),
var.import = 2,
seed.val = 42)
myBiomodModelOut
Hi @vanderleidebastiani, Thank you for the additional information and sorry for this oversight in our code :pray: The proper way to format your CV table should have NA in cells where the point is not included in the given PA dataset. However we intended to have a check that could correct such table (as the info can be found internally in the PA table) but a small coding oversight made it fail as you observed ... You can:
devtools::install_github('biomodhub/biomod2')
On a side note, did you know that you can do block cross-validation within biomod2
as well ? For sure using external BlockCV
may have additionnal flexibility though.
Best, Rémi
Hi @rpatin
Thanks so much. I installed the development version and work fine. This improvement makes it much easier to use.
About my question, I can force regression or classification in BIOMOD_ModelingOptions function (argument do.classif = TRUE/FALSE. Just to register.
Best, Vanderlei
Hi Vanderlei, Indeed I forgot the other question, but you found the answer :+1: Best, Rémi
Error and context
I am using blockCV package to create spatial blocks to separate train and test folds. I am sampling pseudo-absences in BIOMOD_FormatingData function and then using cv_spatial function (package blockCV) to create spatial blocks. Everything was running fine in the previous version of the biomod2 (4.2-2), but the new version (4.2-3) report an error. I think that it is an inconsistency in the steps related to checking names.
Partial console outputs
Model single models
Code used to get the error
Example from BIOMOD_Modeling function (modified)
Environment Information