biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
77 stars 21 forks source link

Help needed with Modeling process #447

Closed ShreePoudel0 closed 1 month ago

ShreePoudel0 commented 2 months ago

myBiomodData <- BIOMOD_FormatingData( #Gathers all input data necessary to run biomod2 models. resp.var = data["Gaur"], #Response Variable ['Column containing presence points'] resp.xy = data[, c('Y', 'X')], #Response Variable ['Column containing Longitude and Latitude'] expl.var = myExpl, #Predictor Variable resp.name = "Gaur", #Name of the response variable PA.nb.rep = 3, #No. of replication for presence data PA.nb.absences = 500,#No. of absence points to generate or background points PA.strategy = 'sre', PA.sre.quant = 0.025,#Strategy for selecting background points

filter.raster = TRUE

)

myBiomodOption <- BIOMOD_ModelingOptions(GAM = list(algo = 'GAM_mgcv', k = 4), RF = list(nodesize = 15, maxnodes = 5), GBM = list(n.trees = 500, interaction.depth = 5, n.minobsinnode = 3)) ) 14 explanatory variables

bio2 bio3 bio6 bio13 bio14 bio18
Min. : 9.917 Min. :43.33 Min. : 0.500 Min. :337.0 Min. : 3.000 Min. : 675
1st Qu.:10.942 1st Qu.:44.28 1st Qu.: 7.400 1st Qu.:458.0 1st Qu.: 5.000 1st Qu.: 856
Median :11.242 Median :44.88 Median : 8.800 Median :497.0 Median : 7.000 Median : 945
Mean :11.323 Mean :44.95 Mean : 7.863 Mean :499.2 Mean : 6.958 Mean : 972
3rd Qu.:11.817 3rd Qu.:45.57 3rd Qu.: 9.100 3rd Qu.:536.0 3rd Qu.: 9.000 3rd Qu.:1040
Max. :12.175 Max. :47.73 Max. :10.000 Max. :661.0 Max. :11.000 Max. :1451
slope aspect water settlement road
Min. :0.0002031 Min. :0.003986 Min. :0.000000 Min. :0.0007376 Min. :0.000000
1st Qu.:0.0041213 1st Qu.:2.735603 1st Qu.:0.000000 1st Qu.:0.0353538 1st Qu.:0.000000
Median :0.0303112 Median :3.453528 Median :0.000000 Median :0.0684819 Median :0.000000
Mean :0.0673542 Mean :3.319681 Mean :0.003274 Mean :0.0736617 Mean :0.006716
3rd Qu.:0.1008327 3rd Qu.:4.117206 3rd Qu.:0.008333 3rd Qu.:0.1045744 3rd Qu.:0.008333
Max. :0.4268092 Max. :6.283185 Max. :0.044876 Max. :0.2150039 Max. :0.095015
population livestock landcover
Min. : 0.00 Min. : 0.00 Min. : 1.757
1st Qu.: 59.74 1st Qu.: 70.73 1st Qu.: 4.000
Median : 138.26 Median :101.26 Median : 4.807
Mean : 660.39 Mean :134.62 Mean : 5.334
3rd Qu.: 723.22 3rd Qu.:193.35 3rd Qu.: 7.000
Max. :93462.73 Max. :757.23 Max. :10.637

3 Pseudo Absences dataset available ( PA1, PA2, PA3 ) with 500 (PA1, PA2, PA3) pseudo absences

myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData, models = c('RF','GAM', 'GBM', 'MAXENT'), bm.options = myBiomodOption, var.import = 1, CV.strategy = 'random', CV.nb.rep = 3, CV.perc = 0.7, prevalence = NULL , metric.eval = c('TSS'))

myBiomodModelEval <- get_evaluations (myBiomodModelOut) myBiomodModelEval

This is my evaluation result full.name PA run algo metric.eval cutoff sensitivity specificity calibration 1 Gaur_PA1_RUN1_RF PA1 RUN1 RF TSS 326.0 88.608 89.714 0.783 2 Gaur_PA1_RUN1_RF PA1 RUN1 RF ROC 333.0 88.608 90.857 0.963 3 Gaur_PA1_RUN1_GAM PA1 RUN1 GAM TSS 475.0 100.000 92.000 0.920 4 Gaur_PA1_RUN1_GAM PA1 RUN1 GAM ROC 478.0 100.000 92.286 0.981 5 Gaur_PA1_RUN1_GBM PA1 RUN1 GBM TSS 496.0 92.405 92.571 0.850 6 Gaur_PA1_RUN1_GBM PA1 RUN1 GBM ROC 496.0 92.405 92.571 0.975 7 Gaur_PA1_RUN1_MAXENT PA1 RUN1 MAXENT TSS 816.0 75.949 100.000 0.759 8 Gaur_PA1_RUN1_MAXENT PA1 RUN1 MAXENT ROC 817.5 75.949 100.000 0.876 9 Gaur_PA1_RUN2_RF PA1 RUN2 RF TSS 290.0 92.405 87.143 0.795 10 Gaur_PA1_RUN2_RF PA1 RUN2 RF ROC 294.0 92.405 87.714 0.958 11 Gaur_PA1_RUN2_GAM PA1 RUN2 GAM TSS 413.0 100.000 87.429 0.874 12 Gaur_PA1_RUN2_GAM PA1 RUN2 GAM ROC 413.0 100.000 87.429 0.972 13 Gaur_PA1_RUN2_GBM PA1 RUN2 GBM TSS 456.0 96.203 89.714 0.859 14 Gaur_PA1_RUN2_GBM PA1 RUN2 GBM ROC 457.5 96.203 89.714 0.975 15 Gaur_PA1_RUN2_MAXENT PA1 RUN2 MAXENT TSS 655.0 75.949 100.000 0.759 16 Gaur_PA1_RUN2_MAXENT PA1 RUN2 MAXENT ROC 656.5 75.949 100.000 0.878 17 Gaur_PA1_RUN3_RF PA1 RUN3 RF TSS 273.0 89.873 87.429 0.773 18 Gaur_PA1_RUN3_RF PA1 RUN3 RF ROC 250.0 91.139 86.286 0.949 19 Gaur_PA1_RUN3_GAM PA1 RUN3 GAM TSS 535.0 94.937 91.714 0.867 20 Gaur_PA1_RUN3_GAM PA1 RUN3 GAM ROC 537.5 94.937 91.714 0.974 21 Gaur_PA1_RUN3_GBM PA1 RUN3 GBM TSS 473.0 93.671 90.000 0.837 22 Gaur_PA1_RUN3_GBM PA1 RUN3 GBM ROC 483.0 92.405 91.714 0.970 23 Gaur_PA1_RUN3_MAXENT PA1 RUN3 MAXENT TSS 247.0 74.684 99.714 0.744 24 Gaur_PA1_RUN3_MAXENT PA1 RUN3 MAXENT ROC 250.0 74.684 99.714 0.872 25 Gaur_PA2_RUN1_RF PA2 RUN1 RF TSS 205.0 94.937 84.000 0.789 26 Gaur_PA2_RUN1_RF PA2 RUN1 RF ROC 203.0 94.937 84.000 0.961 27 Gaur_PA2_RUN1_GAM PA2 RUN1 GAM TSS 515.0 97.468 90.571 0.880 28 Gaur_PA2_RUN1_GAM PA2 RUN1 GAM ROC 515.5 97.468 90.571 0.980 29 Gaur_PA2_RUN1_GBM PA2 RUN1 GBM TSS 459.0 94.937 91.143 0.861 30 Gaur_PA2_RUN1_GBM PA2 RUN1 GBM ROC 460.5 94.937 91.143 0.976 31 Gaur_PA2_RUN1_MAXENT PA2 RUN1 MAXENT TSS 490.0 79.747 100.000 0.797 32 Gaur_PA2_RUN1_MAXENT PA2 RUN1 MAXENT ROC 492.5 79.747 100.000 0.898 33 Gaur_PA2_RUN2_RF PA2 RUN2 RF TSS 189.5 93.671 85.143 0.788 34 Gaur_PA2_RUN2_RF PA2 RUN2 RF ROC 209.0 92.405 86.857 0.954 35 Gaur_PA2_RUN2_GAM PA2 RUN2 GAM TSS 560.0 97.468 94.000 0.915 36 Gaur_PA2_RUN2_GAM PA2 RUN2 GAM ROC 559.5 97.468 94.000 0.981 37 Gaur_PA2_RUN2_GBM PA2 RUN2 GBM TSS 492.0 92.405 94.286 0.867 38 Gaur_PA2_RUN2_GBM PA2 RUN2 GBM ROC 492.5 92.405 94.286 0.971 39 Gaur_PA2_RUN2_MAXENT PA2 RUN2 MAXENT TSS 653.5 77.215 100.000 0.772 40 Gaur_PA2_RUN2_MAXENT PA2 RUN2 MAXENT ROC 657.0 77.215 100.000 0.884 41 Gaur_PA2_RUN3_RF PA2 RUN3 RF TSS 238.0 91.139 86.857 0.780 42 Gaur_PA2_RUN3_RF PA2 RUN3 RF ROC 252.0 91.139 87.143 0.955 43 Gaur_PA2_RUN3_GAM PA2 RUN3 GAM TSS 297.0 100.000 85.714 0.857 44 Gaur_PA2_RUN3_GAM PA2 RUN3 GAM ROC 439.5 96.203 90.000 0.977 45 Gaur_PA2_RUN3_GBM PA2 RUN3 GBM TSS 501.0 89.873 93.143 0.830 46 Gaur_PA2_RUN3_GBM PA2 RUN3 GBM ROC 502.0 89.873 93.143 0.972 47 Gaur_PA2_RUN3_MAXENT PA2 RUN3 MAXENT TSS 656.5 65.823 100.000 0.658 48 Gaur_PA2_RUN3_MAXENT PA2 RUN3 MAXENT ROC 660.0 65.823 100.000 0.828 49 Gaur_PA3_RUN1_RF PA3 RUN1 RF TSS 237.5 96.203 88.857 0.851 50 Gaur_PA3_RUN1_RF PA3 RUN1 RF ROC 245.0 96.203 89.143 0.976 51 Gaur_PA3_RUN1_GAM PA3 RUN1 GAM TSS 586.0 98.734 96.000 0.947 52 Gaur_PA3_RUN1_GAM PA3 RUN1 GAM ROC 589.5 98.734 96.000 0.993 53 Gaur_PA3_RUN1_GBM PA3 RUN1 GBM TSS 488.5 96.203 94.286 0.905 54 Gaur_PA3_RUN1_GBM PA3 RUN1 GBM ROC 488.5 96.203 94.286 0.987 55 Gaur_PA3_RUN1_MAXENT PA3 RUN1 MAXENT TSS 396.0 79.747 99.714 0.795 56 Gaur_PA3_RUN1_MAXENT PA3 RUN1 MAXENT ROC 396.5 79.747 99.714 0.897 57 Gaur_PA3_RUN2_RF PA3 RUN2 RF TSS 184.0 97.468 86.857 0.846 58 Gaur_PA3_RUN2_RF PA3 RUN2 RF ROC 188.0 97.468 87.143 0.977 59 Gaur_PA3_RUN2_GAM PA3 RUN2 GAM TSS 556.0 98.734 94.857 0.936 60 Gaur_PA3_RUN2_GAM PA3 RUN2 GAM ROC 552.5 98.734 94.857 0.992 61 Gaur_PA3_RUN2_GBM PA3 RUN2 GBM TSS 484.0 97.468 94.000 0.915 62 Gaur_PA3_RUN2_GBM PA3 RUN2 GBM ROC 489.5 97.468 94.286 0.987 63 Gaur_PA3_RUN2_MAXENT PA3 RUN2 MAXENT TSS 258.0 83.544 99.714 0.833 64 Gaur_PA3_RUN2_MAXENT PA3 RUN2 MAXENT ROC 259.5 83.544 99.714 0.916 65 Gaur_PA3_RUN3_RF PA3 RUN3 RF TSS 181.0 93.671 86.000 0.797 66 Gaur_PA3_RUN3_RF PA3 RUN3 RF ROC 181.0 93.671 86.000 0.954 67 Gaur_PA3_RUN3_GAM PA3 RUN3 GAM TSS 485.0 96.203 91.429 0.876 68 Gaur_PA3_RUN3_GAM PA3 RUN3 GAM ROC 484.0 96.203 91.429 0.979 69 Gaur_PA3_RUN3_GBM PA3 RUN3 GBM TSS 467.0 93.671 92.000 0.860 70 Gaur_PA3_RUN3_GBM PA3 RUN3 GBM ROC 467.5 93.671 92.286 0.978 71 Gaur_PA3_RUN3_MAXENT PA3 RUN3 MAXENT TSS 695.5 78.481 100.000 0.785 72 Gaur_PA3_RUN3_MAXENT PA3 RUN3 MAXENT ROC 697.5 78.481 100.000 0.891

From the above table, I found the mean TSS algo meanTSS

1 GAM 0.897 2 GBM 0.865 3 RF 0.802 4 MAXENT 0.767 myBiomodEM <- BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut, models.chosen = 'all', em.by = "all", em.algo = c('EMmean'), metric.select = c('TSS'), metric.select.thresh = c(0.6), metric.select.dataset = 'calibration', metric.eval = c('TSS'), var.import = 1) This is the evaluation result of ensemble model. full.name merged.by.PA merged.by.run merged.by.algo 1 Gaur_EMmeanByTSS_mergedData_mergedRun_mergedAlgo mergedData mergedRun mergedAlgo filtered.by algo metric.eval cutoff sensitivity specificity calibration validation evaluation 1 TSS EMmean TSS 267 96.46 87.15 0.838 NA NA Here the GAM and GBM scores are very high than the ensemble score. I dont what the reason it. MY study area is 9500 sq. km. There were 245 presence points but i have thinned it to 116 points. The resolution of my variables is 0.008333333 0.008333333. I will be using 10 individual models but here i've only shown three. Can you please check my entire process and explain where there might be mistake.
MayaGueguen commented 2 months ago

Hello Shree,

Here are a few highlights of the functioning of the BIOMOD_EnsembleModeling function :

So here what you get is an ensemble model that returns you the average predictions of you different single models. If you want to emphasize the fact that, indeed, GAM and GBM seem to give better results than MAXENT models, you can give it a try with em.algo = 'EMwmean' to weight single models based on their evaluation values.

More details can be found in the documentation of the BIOMOD_EnsembleModeling function.

Hope it helps, Maya

ShreePoudel0 commented 2 months ago

I am always getting TSS of ensemble greater than individual models. I dont know if the problem is with code, variables, presence points. Can you look into it. I am doing modeling with 116 points study area is small with 9500 sq km. Variable data is given above. Can you once look into it.

MayaGueguen commented 2 months ago

I think there is no problem with your code :slightly_smiling_face:

As I mentioned, there are several ways of combining single models into ensemble models, all described within the documentation of the BIOMOD_EnsembleModeling function. Depending on the one you choose, you will get different results, and the evaluation of this "new supra model" might not necessary be better than the one of one given single model.

If one idea of doing ensemble models is to get the best out of every model, it is also a way of getting an average model or to explore the variability between the different single models and algorithms.

So it is really dependant on the method you choose to combine your single models.

If your goal is to obtain the best predictions and evaluation metrics, I advise you once again to try with em.algo = 'EMwmean'. :eyes:

Hope it helps, Maya

joshikp01 commented 1 month ago

Hi !!

I think you are evaluating with caliberation dataset. Have you checked your TSS scores with validation dataset ?

bm_PlotEvalMean(myBiomodModelOut, metric.eval = c("TSS"), dataset = "validation", group.by = "algo", do.plot = TRUE)

Maybe this will help ? The difference between validation and caliberation can be very high in some cases.