Closed Jinyu8579 closed 1 year ago
Dear @Jinyu8579,
Thank you for reporting and posting a nicely formatted issue :pray:
Indeed the argument was ignored in the code of BIOMOD_ModelingOptions
.
I just pushed a commit to correct that. If you update to current github version with devtools::install_github('biomodhub/biomod2')
, this should hopefully work.
If not, please let us know.
Best,
Rémi
Dear Rémi,
Thanks for your feedback. We have updated to current github version with devtools::install_github('biomodhub/biomod2'), and it works well now. However, we ran into another roadblock with the remaining code:
RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, : 'prob' or 'vote' not meaningful for regression In addition: Warning message: In randomForest.default(m, y, ...) : The response has five or fewer unique values. Are you sure you want to do regression?
We have found similar error posted on Stack Exchange (https://stats.stackexchange.com/questions/519102/random-forest-error-type-of-predictors-in-new-data-do-not-match-training-set). The solution they reported should be as following: You fitted a regression forest so you get out of it predictions of the response variable, not a probability. You need type = "response" (which is the default). If you meant to fit a classification forest then you need to look again at how you fitted the model, but reading your code, it looks like you a modelling the count of bike rentals, so the probability prediction doesn't make sense; you just want the predicted count (predicted response).
Thus, our roadblock may also be settled by setting 'type = "response"'. But we have not found the relevant argument in either BIOMOD_ModelingOptions or BIOMOD_Modeling.
ps. Code used to get the error
> # Load species occupancy data
> dataSpecies <-read.csv("E:\\Rconversion\\suitability_connectivity\\en_pre_bg_points\\en_pre_bg11_points.csv")
> # Look at structure of dataSpecies dataframe
> str(dataSpecies)
'data.frame': 401 obs. of 4 variables:
$ longitude : num 114 113 113 113 113 ...
$ latitude : num 24.1 24.3 24.2 24.1 23.2 ...
$ pre_bg_num: int 1 2 3 4 5 6 7 8 9 10 ...
$ pre_bg : chr "presence" "presence" "presence" "presence" ...
> # Tell biomod2 which parts of the database refer to which biomod2 object
> myRespName <- "Empoasca_onukii"
> myResp <-as.numeric(dataSpecies[,"pre_bg_num"])
> myRespXY <-dataSpecies[,c("longitude","latitude")]
> # Load species occupancy data
> dataSpecies <-read.csv("E:\\Rconversion\\suitability_connectivity\\en_pre_bg_points\\en_pre_bg12_points.csv")
> # Look at structure of dataSpecies dataframe
> str(dataSpecies)
'data.frame': 401 obs. of 4 variables:
$ longitude : num 114 113 113 113 113 ...
$ latitude : num 24.1 24.3 24.2 24.1 23.2 ...
$ pre_bg_num: int 1 1 1 1 1 1 1 1 1 1 ...
$ pre_bg : chr "presence" "presence" "presence" "presence" ...
> # Tell biomod2 which parts of the database refer to which biomod2 object
> myRespName <- "Empoasca_onukii"
> myResp <-as.numeric(dataSpecies[,"pre_bg_num"])
> myRespXY <-dataSpecies[,c("longitude","latitude")]
> # Create stack of all environmental covariate layers
> myExpl <- envPlus
> names(myExpl)
[1] "bio1" "bio2" "bio3" "bio4" "bio5" "bio6" "bio7"
[8] "bio8" "bio9" "bio10" "bio11" "bio12" "bio13" "bio14"
[15] "bio15" "bio16" "bio17" "bio18" "bio19" "elev" "soil_pH"
[22] "EvergDecid" "EvergBroad" "DecidBroad" "MixedTree" "Shrubs" "HerbVeg" "CultVeg"
[29] "FloodVeg" "Urbab" "Snow" "Barren" "OpenWater" "Mou_300dens" "riverdens"
[36] "slope"
> # Create object (myBiomodData) to contain all the previous objects within it,
> # formatted correctly
> myBiomodData <-BIOMOD_FormatingData(resp.var = myResp,
+ expl.var = myExpl,
+ resp.xy = myRespXY,
+ resp.name = myRespName,
+ PA.nb.rep =0)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Empoasca_onukii Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
! Response variable name was converted into Empoasca.onukii
! Response variable have non-binary values that will be converted into 0 (resp <=0) or 1 (resp > 0).
! No data has been set aside for modeling evaluation
eling evaluation
! No data has been set aside for modeling evaluation
! Some NAs have been automatically removed from your data
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> # Set the options that you have chosen for the different model algorithms
> # - here using defaults (empty arguments)
> myBiomodOptions <- BIOMOD_ModelingOptions(RF = list(do.classif = FALSE,
+ ntree = 500,
+ mtry = 'default',
+ nodesize = 5,
+ maxnodes = NULL))
> myBiomodOptions
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.models.options -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
GLM = list( type = 'quadratic',
interaction.level = 0,
myFormula = NULL,
test = 'AIC',
family = binomial(link = 'logit'),
mustart = 0.5,
control = glm.control(epsilon = 1e-08, maxit = 50, trace = FALSE) ),
GBM = list( distribution = 'bernoulli',
n.trees = 2500,
interaction.depth = 7,
n.minobsinnode = 5,
shrinkage = 0.001,
bag.fraction = 0.5,
train.fraction = 1,
cv.folds = 3,
keep.data = FALSE,
verbose = FALSE,
perf.method = 'cv',
n.cores = 1),
GAM = list( algo = 'GAM_mgcv',
type = 's_smoother',
k = -1,
interaction.level = 0,
myFormula = NULL,
family = binomial(link = 'logit'),
method = 'GCV.Cp',
optimizer = c('outer','newton'),
select = FALSE,
knots = NULL,
paraPen = NULL,
control = list(nthreads = 1, irls.reg = 0, epsilon = 1e-07, maxit = 200, trace = FALSE
, mgcv.tol = 1e-07, mgcv.half = 15, rank.tol = 1.49011611938477e-08
, nlm = list(ndigit=7, gradtol=1e-06, stepmax=2, steptol=1e-04, iterlim=200, check.analyticals=0)
, optim = list(factr=1e+07)
, newton = list(conv.tol=1e-06, maxNstep=5, maxSstep=2, maxHalf=30, use.svd=0), outerPIsteps = 0
, idLinksBases = TRUE, scalePenalty = TRUE, efs.lspmax = 15, efs.tol = 0.1, keepData = FALSE
, scale.est = fletcher, edge.correct = FALSE) ),
CTA = list( method = 'class',
parms = 'default',
cost = NULL,
control = list(xval = 5, minbucket = 5, minsplit = 5, cp = 0.001, maxdepth = 25) ),
ANN = list( NbCV = 5,
size = NULL,
decay = NULL,
rang = 0.1,
maxit = 200),
SRE = list( quant = 0.025),
FDA = list( method = 'mars',
add_args = NULL),
MARS = list( type = 'simple',
interaction.level = 0,
myFormula = NULL,
nk = NULL,
penalty = 2,
thresh = 0.001,
nprune = NULL,
pmethod = 'backward'),
RF = list( do.classif = FALSE,
ntree = 500,
mtry = 'default',
sampsize = NULL,
nodesize = 5,
maxnodes = NULL),
MAXENT = list( path_to_maxent.jar = 'E:/Rconversion/suitability_connectivity',
memory_allocated = 512,
initial heap size = NULL,
maximum heap size = NULL,
background_data_dir = 'default',
maximumbackground = 'default',
maximumiterations = 200,
visible = FALSE,
linear = TRUE,
quadratic = TRUE,
product = TRUE,
threshold = TRUE,
hinge = TRUE,
lq2lqptthreshold = 80,
l2lqthreshold = 10,
hingethreshold = 15,
beta_threshold = -1,
beta_categorical = -1,
beta_lqp = -1,
beta_hinge = -1,
betamultiplier = 1,
defaultprevalence = 0.5),
MAXNET = list( myFormula = NULL,
regmult = 1,
regfun = <function> ),
XGBOOST = list( max.depth = 5,
eta = 0.1,
nrounds = 512,
objective = binary:logistic,
nthread = 1 )
)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> # Run the biomod2 models that you have chosen on the data provided.
> myBiomodModelOut <-BIOMOD_Modeling(
+ bm.format = myBiomodData,
+ modeling.id =paste(myRespName, "Species1", sep=""),
+ models =c("RF"),
+ bm.options = myBiomodOptions,
+ CV.strategy = "kfold",
+ CV.nb.rep = 1,
+ CV.k = 10,
+ CV.perc = 70,
+ CV.do.full.models =FALSE,
+ prevalence = 0.5,
+ metric.eval = c('TSS', 'ROC'),
+ var.import = 10,
+ seed.val = 42,
+ nb.cpu = 1,
+ do.progress = TRUE)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Checking Models arguments...
> Automatic weights creation to rise a 0.5 prevalence
Creating suitable Workdir...
Checking Cross-Validation arguments...
> k-fold cross-validation selection
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Empoasca.onukii Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
36 environmental variables ( bio1 bio2 bio3 bio4 bio5 bio6 bio7 bio8 bio9 bio10 bio11 bio12 bio13 bio14 bio15 bio16 bio17 bio18 bio19 elev soil_pH EvergDecid EvergBroad DecidBroad MixedTree Shrubs HerbVeg CultVeg FloodVeg Urbab Snow Barren OpenWater Mou_300dens riverdens slope )
Number of evaluation repetitions : 10
Models selected : RF
Total number of model runs: 10
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN1_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN1_RF failed!
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN2_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN2_RF failed!
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN3_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN3_RF failed!
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN4_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN4_RF failed!
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN5_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN5_RF failed!
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN6_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN6_RF failed!
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN7_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN7_RF failed!
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN8_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN8_RF failed!
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN9_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN9_RF failed!
-=-=-=--=-=-=- Empoasca.onukii_allData_RUN10_RF
Model=Breiman and Cutler's random forests for classification and regression
> RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, :
'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
*** inherits(g.pred,'try-error')
! Note : Empoasca.onukii_allData_RUN10_RF failed!
! All models failed
Error in fetch(key) :
lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt
Error in fetch(key) :
lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt
Error in fetch(key) :
lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt
Error in fetch(key) :
lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt
>
Bests,
Jinyu
Dear Jinyu,
Thank you for the update and the referenced information :pray:
I updated the code for the predict function to be able to handle RF
with either do.classif = TRUE
or do.classif = FALSE
. However please note that you likely have limited interest in using do.classif = FALSE
but that is up to you.
If you update to current github version with devtools::install_github('biomodhub/biomod2'), this should hopefully work. But please let us know if you encounter further problems.
Additionally, I see that you had the following error:
lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt
that should be solved when you restart R, e.g. see https://github.com/biomodhub/biomod2/issues/319#issuecomment-1683991433
Best, Rémi
Dear Rémi,
Thanks for your quick feedback. We have updated to current github version with devtools::install_github('biomodhub/biomod2'), and it has been working well these days.
We initially wanted to treat the binary (1-presence/0-background) data as a continuous response variable in order to end up with a continuous measure of suitability. Thus, we planned to run a regression model, and set "do.classif = FALSE" for the parameter of the RF model when running the function "BIOMOD_ModelingOptions".
We are now facing the roadblock of overfitting, and try to solve the problem by adjusting the nodesize and maxnodes parameters of randomForest, as you suggested at #304 #261 #247 .
Thanks again for your help.
Best,
Jinyu
Please make sure to close the issue once you consider it as solved Please use screenshots only when you cannot copy-paste the object, e.g. for figures or maps
Error and context
We wanted to use the biomod2 inner model random forest to perform species distribution modeling on an insect species, based on 300 presence points and 100 background points. We treated the binary (1-presence/0-background) data as a continuous response variable in order to end up with a continuous measure of suitability. Thus, we ran a regression model, and set "do.classif = FALSE" for the parameter of the RF model when running the function "BIOMOD_ModelingOptions". However, we did not see the change of the parameter, it is still the default setting (i.e., do.classif = TRUE).
Code used to get the error Please add the code used to reproduce your error, starting with
BIOMOD_FormatingData
up to the function that bugged. Please add as well :show
for the different object used or generated.Environment Information Please paste the output of
sessionInfo()
in your current R session below.Additional information If you have any additional information or context you can add it here.