Open Saurav1227 opened 2 weeks ago
Hello there,
I see nothing wrong here 👀 You have indeed (relatively) low TSS scores and some overfitting for RF (even with the down sampling method ! ). However, if you look at the validation, the scores with the different algorithms will be similar (except SRE).
To correct the overfitting of RF, you can also try to increase nodesize
and try different maxnodes
.
(Note that we will soon be ready to switch to version 4.2-6 on github which will contain a new single model named RFd computing down-sampled RF without having to specify options for basic RF like we do now. It will be easier for you.)
As you already suggested, your data seem homogeneous distributed, so trying with different explanation variables or at another scale could be the solution to improve the model. I hope you'll find a solution soon!
Hélène
Hi there,
Thank you for your response and checking my code and data. I am working on it.
Regards! Saurav
On Mon, 10 Jun 2024 at 8:54 pm, HBlancheteau @.***> wrote:
Hello there,
I see nothing wrong here 👀 You have indeed (relatively) low TSS scores and some overfitting for RF (even with the down sampling method ! ). However, if you look at the validation, the scores with the different algorithms will be similar (except SRE).
To correct the overfitting of RF, you can also try to increase nodesize and try different maxnodes arguments. (Note that we will soon be ready to switch to version 4.2-6 on github which will contain a new single model named RFd computing down-sampled RF without having to specify options for basic RF like we do now. It will be easier for you.)
As you already suggested, your data seem homogeneous distributed, so trying with different explanation variables or at another scale could be the solution to improve the model. I hope you'll find a solution soon!
Hélène
— Reply to this email directly, view it on GitHub https://github.com/biomodhub/biomod2/issues/474#issuecomment-2158272045, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIJS6VU4EXXFUIQX7NFJPLLZGWOZRAVCNFSM6AAAAABJBQWLCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJYGI3TEMBUGU . You are receiving this because you authored the thread.Message ID: @.***>
Context and question I am working on an ensemble modeling project for the Australian Bustard in Australia using the Biomod2 package. While the code runs without errors, I consistently receive low True Skill Statistic (TSS) scores, except for the Random Forest algorithm. Here are the specifics: High TSS Score (>0.8): Only for Random Forest Low TSS Score (<0.3 or less): For other algorithms Attempts to Troubleshoot:
Spatial Auto-correlation: Tried grids of 11 km and 5 5 km. Pseudo Absence Points: Used 10,000, 15,000, and 30,000 points outside a 1 km buffer from the pseudo points. Geographic Scale: Applied the model at the state level (e.g., Western Australia) but encountered similar low TSS scores. Data Distribution: My data is homogeneously distributed, which I suspect might be causing the issue.
Related code
Load rasters library("raster")
Load required packages library("raster") library("ggplot2") library("gridExtra") library("rasterVis") library("sf") library("usdm") library("biomod2") library("dismo") library("mgcv") library("gam")
Dem <- raster("D:/Saurav/Ascii/dem.asc") aspect <- raster("D:/Saurav/Ascii/aspect.asc") slope <- raster("D:/Saurav/Ascii/slope.asc") bio1 <- raster("D:/Saurav/Ascii/bio1.asc") bio2 <- raster("D:/Saurav/Ascii/bio2.asc") bio3 <- raster("D:/Saurav/Ascii/bio3.asc") bio4 <- raster("D:/Saurav/Ascii/bio4.asc") bio5 <- raster("D:/Saurav/Ascii/bio5.asc") bio6 <- raster("D:/Saurav/Ascii/bio6.asc") bio7 <- raster("D:/Saurav/Ascii/bio7.asc") bio8 <- raster("D:/Saurav/Ascii/bio8.asc") bio9 <- raster("D:/Saurav/Ascii/bio9.asc") bio10 <- raster("D:/Saurav/Ascii/bio10.asc") bio11 <- raster("D:/Saurav/Ascii/bio11.asc") bio12 <- raster("D:/Saurav/Ascii/bio12.asc") bio13 <- raster("D:/Saurav/Ascii/bio13.asc") bio14 <- raster("D:/Saurav/Ascii/bio14.asc") bio15 <- raster("D:/Saurav/Ascii/bio15.asc") bio16 <- raster("D:/Saurav/Ascii/bio16.asc") bio17 <- raster("D:/Saurav/Ascii/bio17.asc") bio18 <- raster("D:/Saurav/Ascii/bio18.asc") bio19 <- raster("D:/Saurav/Ascii/bio19.asc")
stackthe variables
myExpl <- raster::stack (Dem, slope, aspect, bio1,bio2, bio3, bio4, bio5, bio6, bio7, bio8, bio9, bio10, bio11, bio12, bio13,bio14, bio15, bio16, bio17, bio18, bio19) plot(myExpl)
Convert the raster stack to a data frame myExpl_df <- as.data.frame(myExpl, xy = TRUE) myExpl_df
Calculate VIF vif_result <- vifcor(myExpl_df, 0.7)
Print VIF results print(vif_result)
rasterstack of layers with below 5 VIF and 0.7 Corr
myExpl1<- raster::stack (Dem, slope, aspect, bio2, bio3, bio9, bio15, bio18, bio19)
plot(myExpl1)
Read the CSV file Bustard <- read.csv("D:/Saurav/Ascii2/Bustard.csv", header = TRUE)
ThinnedBustard <- thin(loc.data = Bustard, lat.col = "Latitude", long.col = "Longitude", spec.col = "Bustard", thin.par = 1, reps = 1, locs.thinned.list.return = TRUE, write.log.file = FALSE, out.dir = "D:/Saurav2/Ascii/thinnedBustard")
ThinnedBustard <- read.csv("D:/Saurav2/Ascii/thinnedbustard/thinned_data_thin1.csv", header = TRUE)
Define your response variable myRespName <- 'Bustard' myResp <- as.numeric(ThinnedBustard[, myRespName])
The XY coordinates of species data myRespXY <- ThinnedBustard[, c("Longitude", "Latitude")]
plot(ThinnedBustard)
myRespXY ?BIOMOD_FormatingData
Formating biomod data myBiomodData <- BIOMOD_FormatingData(resp.var = myResp, expl.var = myExpl1, resp.xy = myRespXY, resp.name = myRespName, PA.nb.rep = 3, PA.nb.absences = 15000, PA.strategy = 'disk', PA.dist.min = 1000) # 1 Bustard.csv km buffer myRespXY ?BIOMOD_FormatingData n=round(nrow(ThinnedBustard)*0.7-1) #(counts the number of rows of column to fewed it into the downsampled RF)
Define RF options
myRFoptions <- list(ntree=500, sampsize =c("0"=n, "1"= n), replace=TRUE, nodesize=5) user.RF <- rep( list(myRFoptions), (ncol(myBiomodData@PA.table) + 1 )) names(user.RF) <- c( paste0("_", names(myBiomodData@PA.table), "_allRun"), "_allData_allRun")
Define user values
user.val <- list(RF.binary.randomForest.randomForest = user.RF)
all.models<- c("RF","ANN", "CTA", "FDA", "GAM", "GBM", "GLM", "MARS", "MAXNET" , "SRE")
Set up modeling options
myBiomodOption <- bm_ModelingOptions( data.type = "binary", models = all.models, strategy = 'user.defined', user.val= user.val, bm.format = myBiomodData, calib.lines = NULL )
myBiomodModelOut <- BIOMODModeling( bm.format = myBiomodData, modeling.id = paste(myRespName, "Bustard", sep = ""), models = all.models, OPT.strategy= 'user.defined', OPT.user = myBiomodOption, #!! CV.strategy = "random", CV.nb.rep = 3, CV.perc = 0.7, metric.eval = c("ROC", "TSS"), var.import = 0, scale.models = FALSE, nb.cpu = 8, do.progress = TRUE, seed.val = 42, )
myCalibLines <- get_calib_lines(myBiomodModelOut) plot(myBiomodData, calib.lines = myCalibLines)
Get all models evaluation myBiomodModelEval <- get_evaluations(myBiomodModelOut) myBiomodModelEval export<- as.data.frame(myBiomodModelEval)
write.csv(export, "D:/Saurav2/Ascii/ensembelresults.csv")
Plot model evaluation scores plot<- bm_PlotEvalMean(myBiomodModelOut, metric.eval = c("ROC", "TSS"), dataset="calibration", group.by = "algo", do.plot = TRUE) #if you wanna assign to plot
plot
https://drive.google.com/file/d/16N770JVTS4uH0Fn2cRNFYEa-IPFEsI6J/view?usp=sharing
Data: https://drive.google.com/file/d/1_BNgL6VUd_kpvIVlfyC7RaOYbrj3HRnN/view?usp=sharing
Variables: https://drive.google.com/file/d/1CK0w3blkw-Oz9R-TWm--zjQ0l7vZrFGQ/view?usp=sharing