Closed joshikp01 closed 1 month ago
Hello there,
Thanks a lot for all this information: it's really easier to help you ! Indeed, you have an overfitting for RF.
Here, you only change the options for "_allData_allRun".
You could have a look at the options with the line :
print(myBiomodOption,dataset = "_PA1_allRun")
You can change for all the different PA datasets like this :
#Define MAXENT options
myMAXENToptions <- list(threshold = FALSE, path_to_maxent.jar = ".")
user.MAXENT <- list('_allData_allRun'= myMAXENToptions,
'_PA1_allRun' = myMAXENToptions,
'_PA2_allRun' = myMAXENToptions,
'_PA3_allRun' = myMAXENToptions)
#Define RF options
myRFoptions <- list(nodesize = 5, maxnodes = 10)
user.RF <- rep( list(myRFoptions), (ncol(myBiomodData@PA.table) + 1 ))
names(user.RF) <- c( paste0("_", names(myBiomodData@PA.table), "_allRun"), "_allData_allRun")
#Define user values
user.val <- list(RF.binary.randomForest.randomForest = user.RF ,MAXENT.binary.MAXENT.MAXENT= user.MAXENT)
#Set up modeling options
myBiomodOption <- bm_ModelingOptions(
data.type = "binary",
models = all.models,
strategy = 'user.defined',
user.val= user.val,
bm.format = myBiomodData,
calib.lines = NULL
)
myBiomodModelOut <- BIOMOD_Modeling(
bm.format = myBiomodData,
modeling.id = paste(myRespName, "Sal", sep = "_"),
models = all.models,
OPT.strategy= 'user.defined'
#OPT.user.val=user.val,
OPT.user = myBiomodOption, #!!
CV.strategy = "random",
CV.nb.rep = 3,
CV.perc = 0.8,
metric.eval = c("ROC", "TSS"),
var.import = 0,
scale.models = FALSE,
nb.cpu = 6,
do.progress = TRUE,
seed.val = 42,
)
If you give OPT.user = myBiomodOption
, BIOMOD_Modeling will automatically give the options of '_PAx_allRun' to all runs with PAx (_PAx_RUN1, _PAx_RUN2,...)
I hope it helps! Don't hesitate if you have any other questions about this new version of biomod2.
Hélène
Dear @HeleneBlt
Thank you very much for the clear explanation and code. I have another question:
Does having an ensemble model with less TSS score than the individual score mean any problem or it is okay? I am asking this because my final ensemble model is getting a lower TSS score than the RF/GBM. What do you suggest in such cases? Ensemble or just go ahead with the individual model?
Thank you in advance :)
Hi again,
Yes, actually it is quite normal. As you probably already read in the issue #447, the Ensemble Model is an average model.
As you suggest, it is important to look at the calibration and validation scores (as it can show an overfitting for example :wink: ).
I would also strongly advise looking at EMcv
or EMci
and EMci.alpha
to see the variability between the models.
With this exploration and depending on your goals, you can choose if a single model will be better or if the Ensemble Model will give you more realistic results on the whole study area.
It really depends on your case.
Hélène
Dear Helene,
Thank you and grateful to you for assisting me.
Hello !!
I am trying to use the BIOMOD2 modeling but I am getting very weird model evaluation scores.... when using calibration data the RF is getting 1 every time (ROC/TSS) but when using the validation dataset the RF is getting the lowest TSS scores. I had used calibrated the RF code before but right now after the update (4.2-5) it is back to getting inflated TSS/ROC scores. Please find the code below: (Note: total presence location =94)
Load required packages
library("raster") library("ggplot2") library("gridExtra") library("rasterVis") library("sf") library("usdm") library("biomod2") library("dismo") library("mgcv") library("gam")
variables for current projection:
DEM<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/dem.asc") slope<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/slope.asc") aspect<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/aspect.asc") bio1<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio1.asc") bio2<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio2.asc") bio3<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio3.asc") bio4<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio4.asc") bio5<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio5.asc") bio6<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio6.asc") bio7<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio7.asc") bio8<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio8.asc") bio9<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio9.asc") bio10<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio10.asc") bio11<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio11.asc") bio12<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio12.asc") bio13<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio13.asc") bio14<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio14.asc") bio15<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio15.asc") bio16<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio16.asc") bio17<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio17.asc") bio18<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio18.asc") bio19<- raster ("F:/Khagendra (Ensemble analysis)/Worldclim data/midhills/ASCII/bio19.asc")
stackthe variables
myExpl <- raster::stack (DEM,slope,aspect, bio1,bio2, bio3, bio4, bio5, bio6, bio7, bio8, bio9, bio10, bio11, bio12, bio13,bio14, bio15, bio16, bio17, bio18, bio19) plot(myExpl)
Convert the raster stack to a data frame
myExpl_df <- as.data.frame(myExpl, xy = TRUE) myExpl_df
Calculate VIF
vif_result <- vifcor(myExpl_df, 0.7)
Print VIF results
print(vif_result)
rasterstack of layers with below 5 VIF and 0.7 Corr
myExpl1<- raster::stack (DEM, slope, aspect, bio3, bio4, bio12, bio14, bio15)
plot(myExpl1)
thinned_sal<- read.csv("F:/Khagendra (Ensemble analysis)/Towa data/Tree location/Final location species wise/GBIF and other secondary sourcer/thinned/thinned_sal.csv")
Species data
myRespName <- 'Sal' myRespName
My Resp Data
myResp <- as.numeric(thinned_sal [,myRespName]) myResp
head(myResp)
The XY coordinates of species data
myRespXY <- thinned_sal [,c("Longitude", "Latitude")]
myRespXY
Formating biomod data
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp, expl.var = myExpl1, resp.xy = myRespXY, resp.name = myRespName, PA.nb.rep = 3, PA.nb.absences = 2000, PA.strategy = 'disk', PA.dist.min =10000) plot(myBiomodData)
Define MAXENT options
user.MAXENT <- list('_allData_allRun'= list(maximumiterations = 10,
Define RF options
user.RF <- list('_allData_allRun'= list(nodesize = 5, maxnodes = 10))
Define user values
user.val <- list(RF.binary.randomForest.randomForest = user.RF) #,MAXENT.binary.MAXENT.MAXENT= user.MAXENT)
all.models<- c("ANN", "CTA", "FDA", "GAM", "GBM", "GLM", "MARS", "MAXENT", "RF", "SRE")
?bm_ModelingOptions
Set up modeling options
myBiomodOption <- bm_ModelingOptions( data.type = "binary", models = all.models, strategy = 'user.defined', user.val= user.val, bm.format = myBiomodData, calib.lines = NULL )
?BIOMOD_Modeling #OPT.user = myBiomodOption,
myBiomodModelOut <- BIOMODModeling( bm.format = myBiomodData, modeling.id = paste(myRespName, "Sal", sep = ""), models = all.models, OPT.user.val=user.val, CV.strategy = "random", #random for normal setting CV.nb.rep = 3, CV.perc = 0.8, # Specify data split percentage for random cross-validation CV.balance = 'presences', # Balance strategy for cross-validation OPT.strategy= 'user.defined', weights = NULL, # Observation weights (optional) prevalence = NULL, # Species prevalence (optional) metric.eval = c("ROC", "TSS"), # Additional evaluation metrics var.import = 0, # Number of permutations for variable importance,..3 scale.models = FALSE, # Disable model scaling nb.cpu = 6, # Number of CPU cores do.progress = TRUE, # Show progress bar seed.val = 42, ) myBiomodModelOut
Get all models evaluation
myBiomodModelEval <- get_evaluations(myBiomodModelOut) myBiomodModelEval export<- as.data.frame(myBiomodModelEval)
write.csv(export, "F:/Khagendra (Ensemble analysis)/fig/ensembelresults.csv")
Plot model evaluation scores
plot<- bm_PlotEvalMean(myBiomodModelOut, metric.eval = c("ROC", "TSS"), dataset = "calibration", group.by = "algo", do.plot = TRUE) #if you wanna assign to plot
plot
The two figures are uploaded:
![validation](https://github.com/biomodhub/biomod2/assets/167947828/c7d00dfa-edd7-47fa-a984-2acb459d1c57)
Another issue is when i ensemble (TSS above 0.7), the TSS score of the ensemble model is less than RF every time.
I have provided a link for my code and all the files !! here, https://drive.google.com/file/d/1GdrzfXUYNKXneW5JICrx2l4EhrcJqufy/view?usp=sharing