ShreePoudel0 commented 2 months ago

myBiomodData <- BIOMOD_FormatingData( #Gathers all input data necessary to run biomod2 models. resp.var = data["Gaur"], #Response Variable ['Column containing presence points'] resp.xy = data[, c('Y', 'X')], #Response Variable ['Column containing Longitude and Latitude'] expl.var = myExpl, #Predictor Variable resp.name = "Gaur", #Name of the response variable PA.nb.rep = 3, #No. of replication for presence data PA.nb.absences = 500,#No. of absence points to generate or background points PA.strategy = 'sre', PA.sre.quant = 0.025,#Strategy for selecting background points

filter.raster = TRUE

)

myBiomodOption <- BIOMOD_ModelingOptions(GAM = list(algo = 'GAM_mgcv', k = 4), RF = list(nodesize = 15, maxnodes = 5), GBM = list(n.trees = 500, interaction.depth = 5, n.minobsinnode = 3)) ) 14 explanatory variables

bio2 bio3 bio6 bio13 bio14 bio18
Min. : 9.917 Min. :43.33 Min. : 0.500 Min. :337.0 Min. : 3.000 Min. : 675
1st Qu.:10.942 1st Qu.:44.28 1st Qu.: 7.400 1st Qu.:458.0 1st Qu.: 5.000 1st Qu.: 856
Median :11.242 Median :44.88 Median : 8.800 Median :497.0 Median : 7.000 Median : 945
Mean :11.323 Mean :44.95 Mean : 7.863 Mean :499.2 Mean : 6.958 Mean : 972
3rd Qu.:11.817 3rd Qu.:45.57 3rd Qu.: 9.100 3rd Qu.:536.0 3rd Qu.: 9.000 3rd Qu.:1040
Max. :12.175 Max. :47.73 Max. :10.000 Max. :661.0 Max. :11.000 Max. :1451
slope aspect water settlement road
Min. :0.0002031 Min. :0.003986 Min. :0.000000 Min. :0.0007376 Min. :0.000000
1st Qu.:0.0041213 1st Qu.:2.735603 1st Qu.:0.000000 1st Qu.:0.0353538 1st Qu.:0.000000
Median :0.0303112 Median :3.453528 Median :0.000000 Median :0.0684819 Median :0.000000
Mean :0.0673542 Mean :3.319681 Mean :0.003274 Mean :0.0736617 Mean :0.006716
3rd Qu.:0.1008327 3rd Qu.:4.117206 3rd Qu.:0.008333 3rd Qu.:0.1045744 3rd Qu.:0.008333
Max. :0.4268092 Max. :6.283185 Max. :0.044876 Max. :0.2150039 Max. :0.095015
population livestock landcover
Min. : 0.00 Min. : 0.00 Min. : 1.757
1st Qu.: 59.74 1st Qu.: 70.73 1st Qu.: 4.000
Median : 138.26 Median :101.26 Median : 4.807
Mean : 660.39 Mean :134.62 Mean : 5.334
3rd Qu.: 723.22 3rd Qu.:193.35 3rd Qu.: 7.000
Max. :93462.73 Max. :757.23 Max. :10.637

3 Pseudo Absences dataset available ( PA1, PA2, PA3 ) with 500 (PA1, PA2, PA3) pseudo absences

myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData, models = c('RF','GAM', 'GBM', 'MAXENT'), bm.options = myBiomodOption, var.import = 1, CV.strategy = 'random', CV.nb.rep = 3, CV.perc = 0.7, prevalence = NULL , metric.eval = c('TSS'))

myBiomodModelEval <- get_evaluations (myBiomodModelOut) myBiomodModelEval

This is my evaluation result full.name PA run algo 1 Gaur_PA1_RUN1_RF PA1 RUN1 RF 2 Gaur_PA1_RUN1_RF PA1 RUN1 RF 3 Gaur_PA1_RUN1_GAM PA1 RUN1 GAM 4 Gaur_PA1_RUN1_GAM PA1 RUN1 GAM 5 Gaur_PA1_RUN1_GBM PA1 RUN1 GBM 6 Gaur_PA1_RUN1_GBM PA1 RUN1 GBM 7 Gaur_PA1_RUN1_MAXENT PA1 RUN1 MAXENT 8 Gaur_PA1_RUN1_MAXENT PA1 RUN1 MAXENT 9 Gaur_PA1_RUN2_RF PA1 RUN2 RF 10 Gaur_PA1_RUN2_RF PA1 RUN2 RF 11 Gaur_PA1_RUN2_GAM PA1 RUN2 GAM 12 Gaur_PA1_RUN2_GAM PA1 RUN2 GAM 13 Gaur_PA1_RUN2_GBM PA1 RUN2 GBM 14 Gaur_PA1_RUN2_GBM PA1 RUN2 GBM 15 Gaur_PA1_RUN2_MAXENT PA1 RUN2 MAXENT 16 Gaur_PA1_RUN2_MAXENT PA1 RUN2 MAXENT 17 Gaur_PA1_RUN3_RF PA1 RUN3 RF 18 Gaur_PA1_RUN3_RF PA1 RUN3 RF 19 Gaur_PA1_RUN3_GAM PA1 RUN3 GAM 20 Gaur_PA1_RUN3_GAM PA1 RUN3 GAM 21 Gaur_PA1_RUN3_GBM PA1 RUN3 GBM 22 Gaur_PA1_RUN3_GBM PA1 RUN3 GBM 23 Gaur_PA1_RUN3_MAXENT PA1 RUN3 MAXENT 24 Gaur_PA1_RUN3_MAXENT PA1 RUN3 MAXENT 25 Gaur_PA2_RUN1_RF PA2 RUN1 RF 26 Gaur_PA2_RUN1_RF PA2 RUN1 RF 27 Gaur_PA2_RUN1_GAM PA2 RUN1 GAM 28 Gaur_PA2_RUN1_GAM PA2 RUN1 GAM 29 Gaur_PA2_RUN1_GBM PA2 RUN1 GBM 30 Gaur_PA2_RUN1_GBM PA2 RUN1 GBM 31 Gaur_PA2_RUN1_MAXENT PA2 RUN1 MAXENT 32 Gaur_PA2_RUN1_MAXENT PA2 RUN1 MAXENT 33 Gaur_PA2_RUN2_RF PA2 RUN2 RF 34 Gaur_PA2_RUN2_RF PA2 RUN2 RF 35 Gaur_PA2_RUN2_GAM PA2 RUN2 GAM 36 Gaur_PA2_RUN2_GAM PA2 RUN2 GAM 37 Gaur_PA2_RUN2_GBM PA2 RUN2 GBM 38 Gaur_PA2_RUN2_GBM PA2 RUN2 GBM 39 Gaur_PA2_RUN2_MAXENT PA2 RUN2 MAXENT 40 Gaur_PA2_RUN2_MAXENT PA2 RUN2 MAXENT 41 Gaur_PA2_RUN3_RF PA2 RUN3 RF 42 Gaur_PA2_RUN3_RF PA2 RUN3 RF 43 Gaur_PA2_RUN3_GAM PA2 RUN3 GAM 44 Gaur_PA2_RUN3_GAM PA2 RUN3 GAM 45 Gaur_PA2_RUN3_GBM PA2 RUN3 GBM 46 Gaur_PA2_RUN3_GBM PA2 RUN3 GBM 47 Gaur_PA2_RUN3_MAXENT PA2 RUN3 MAXENT 48 Gaur_PA2_RUN3_MAXENT PA2 RUN3 MAXENT 49 Gaur_PA3_RUN1_RF PA3 RUN1 RF 50 Gaur_PA3_RUN1_RF PA3 RUN1 RF 51 Gaur_PA3_RUN1_GAM PA3 RUN1 GAM 52 Gaur_PA3_RUN1_GAM PA3 RUN1 GAM 53 Gaur_PA3_RUN1_GBM PA3 RUN1 GBM 54 Gaur_PA3_RUN1_GBM PA3 RUN1 GBM 55 Gaur_PA3_RUN1_MAXENT PA3 RUN1 MAXENT 56 Gaur_PA3_RUN1_MAXENT PA3 RUN1 MAXENT 57 Gaur_PA3_RUN2_RF PA3 RUN2 RF 58 Gaur_PA3_RUN2_RF PA3 RUN2 RF 59 Gaur_PA3_RUN2_GAM PA3 RUN2 GAM 60 Gaur_PA3_RUN2_GAM PA3 RUN2 GAM 61 Gaur_PA3_RUN2_GBM PA3 RUN2 GBM 62 Gaur_PA3_RUN2_GBM PA3 RUN2 GBM 63 Gaur_PA3_RUN2_MAXENT PA3 RUN2 MAXENT 64 Gaur_PA3_RUN2_MAXENT PA3 RUN2 MAXENT 65 Gaur_PA3_RUN3_RF PA3 RUN3 RF 66 Gaur_PA3_RUN3_RF PA3 RUN3 RF 67 Gaur_PA3_RUN3_GAM PA3 RUN3 GAM 68 Gaur_PA3_RUN3_GAM PA3 RUN3 GAM 69 Gaur_PA3_RUN3_GBM PA3 RUN3 GBM 70 Gaur_PA3_RUN3_GBM PA3 RUN3 GBM 71 Gaur_PA3_RUN3_MAXENT PA3 RUN3 MAXENT 72 Gaur_PA3_RUN3_MAXENT PA3 RUN3 MAXENT metric.eval cutoff sensitivity specificity calibration TSS 326.0 88.608 89.714 0.783 ROC 333.0 88.608 90.857 0.963 TSS 475.0 100.000 92.000 0.920 ROC 478.0 100.000 92.286 0.981 TSS 496.0 92.405 92.571 0.850 ROC 496.0 92.405 92.571 0.975 TSS 816.0 75.949 100.000 0.759 ROC 817.5 75.949 100.000 0.876 TSS 290.0 92.405 87.143 0.795 ROC 294.0 92.405 87.714 0.958 TSS 413.0 100.000 87.429 0.874 ROC 413.0 100.000 87.429 0.972 TSS 456.0 96.203 89.714 0.859 ROC 457.5 96.203 89.714 0.975 TSS 655.0 75.949 100.000 0.759 ROC 656.5 75.949 100.000 0.878 TSS 273.0 89.873 87.429 0.773 ROC 250.0 91.139 86.286 0.949 TSS 535.0 94.937 91.714 0.867 ROC 537.5 94.937 91.714 0.974 TSS 473.0 93.671 90.000 0.837 ROC 483.0 92.405 91.714 0.970 TSS 247.0 74.684 99.714 0.744 ROC 250.0 74.684 99.714 0.872 TSS 205.0 94.937 84.000 0.789 ROC 203.0 94.937 84.000 0.961 TSS 515.0 97.468 90.571 0.880 ROC 515.5 97.468 90.571 0.980 TSS 459.0 94.937 91.143 0.861 ROC 460.5 94.937 91.143 0.976 TSS 490.0 79.747 100.000 0.797 ROC 492.5 79.747 100.000 0.898 TSS 189.5 93.671 85.143 0.788 ROC 209.0 92.405 86.857 0.954 TSS 560.0 97.468 94.000 0.915 ROC 559.5 97.468 94.000 0.981 TSS 492.0 92.405 94.286 0.867 ROC 492.5 92.405 94.286 0.971 TSS 653.5 77.215 100.000 0.772 ROC 657.0 77.215 100.000 0.884 TSS 238.0 91.139 86.857 0.780 ROC 252.0 91.139 87.143 0.955 TSS 297.0 100.000 85.714 0.857 ROC 439.5 96.203 90.000 0.977 TSS 501.0 89.873 93.143 0.830 ROC 502.0 89.873 93.143 0.972 TSS 656.5 65.823 100.000 0.658 ROC 660.0 65.823 100.000 0.828 TSS 237.5 96.203 88.857 0.851 ROC 245.0 96.203 89.143 0.976 TSS 586.0 98.734 96.000 0.947 ROC 589.5 98.734 96.000 0.993 TSS 488.5 96.203 94.286 0.905 ROC 488.5 96.203 94.286 0.987 TSS 396.0 79.747 99.714 0.795 ROC 396.5 79.747 99.714 0.897 TSS 184.0 97.468 86.857 0.846 ROC 188.0 97.468 87.143 0.977 TSS 556.0 98.734 94.857 0.936 ROC 552.5 98.734 94.857 0.992 TSS 484.0 97.468 94.000 0.915 ROC 489.5 97.468 94.286 0.987 TSS 258.0 83.544 99.714 0.833 ROC 259.5 83.544 99.714 0.916 TSS 181.0 93.671 86.000 0.797 ROC 181.0 93.671 86.000 0.954 TSS 485.0 96.203 91.429 0.876 ROC 484.0 96.203 91.429 0.979 TSS 467.0 93.671 92.000 0.860 ROC 467.5 93.671 92.286 0.978 TSS 695.5 78.481 100.000 0.785 ROC 697.5 78.481 100.000 0.891

From the above table, I found the mean TSS algo meanTSS

1 GAM 0.897 2 GBM 0.865 3 RF 0.802 4 MAXENT 0.767 myBiomodEM <- BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut, models.chosen = 'all', em.by = "all", em.algo = c('EMmean'), metric.select = c('TSS'), metric.select.thresh = c(0.6), metric.select.dataset = 'calibration', metric.eval = c('TSS'), var.import = 1) This is the evaluation result of ensemble model. full.name merged.by.PA merged.by.run merged.by.algo 1 Gaur_EMmeanByTSS_mergedData_mergedRun_mergedAlgo mergedData mergedRun mergedAlgo filtered.by algo metric.eval cutoff sensitivity specificity calibration validation evaluation 1 TSS EMmean TSS 267 96.46 87.15 0.838 NA NA Here the GAM and GBM scores are very high than the ensemble score. I dont what the reason it. MY study area is 9500 sq. km. There were 245 presence points but i have thinned it to 116 points. The resolution of my variables is 0.008333333 0.008333333. I will be using 10 individual models but here i've only shown three. Can you please check my entire process and explain where there might be mistake.

MayaGueguen commented 2 months ago

Hello Shree,

Here are a few highlights of the functioning of the BIOMOD_EnsembleModeling function :

first, it will select only single models matching the metric.select[...] parameters you provided, so here : models whose TSS value within the calibration dataset were >= 0.6 (in your case, I think all single models are fullfilling this criteria)
then it will combine these selected single models according to the em.algo you chose, so here : EMmean meaning that for each point, it will take all predicted values by single models kept, and average those predictions
finally, with these new "predictions", it will compute the metric.eval metrics you requested, so here : TSS

So here what you get is an ensemble model that returns you the average predictions of you different single models. If you want to emphasize the fact that, indeed, GAM and GBM seem to give better results than MAXENT models, you can give it a try with em.algo = 'EMwmean' to weight single models based on their evaluation values.

More details can be found in the documentation of the BIOMOD_EnsembleModeling function.

Hope it helps, Maya

ShreePoudel0 commented 2 months ago

I am always getting TSS of ensemble greater than individual models. I dont know if the problem is with code, variables, presence points. Can you look into it. I am doing modeling with 116 points study area is small with 9500 sq km. Variable data is given above. Can you once look into it.

MayaGueguen commented 2 months ago

I think there is no problem with your code :slightly_smiling_face:

As I mentioned, there are several ways of combining single models into ensemble models, all described within the documentation of the BIOMOD_EnsembleModeling function. Depending on the one you choose, you will get different results, and the evaluation of this "new supra model" might not necessary be better than the one of one given single model.

If one idea of doing ensemble models is to get the best out of every model, it is also a way of getting an average model or to explore the variability between the different single models and algorithms.

So it is really dependant on the method you choose to combine your single models.

If your goal is to obtain the best predictions and evaluation metrics, I advise you once again to try with em.algo = 'EMwmean'. :eyes:

Hope it helps, Maya

joshikp01 commented 1 month ago

Hi !!

I think you are evaluating with caliberation dataset. Have you checked your TSS scores with validation dataset ?

bm_PlotEvalMean(myBiomodModelOut, metric.eval = c("TSS"), dataset = "validation", group.by = "algo", do.plot = TRUE)

Maybe this will help ? The difference between validation and caliberation can be very high in some cases.

biomodhub / biomod2

Help needed with Modeling process #447

filter.raster = TRUE