biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
89 stars 22 forks source link

Help with BIOMOD Ensemble Model Future Projections #518

Open kalebbanks opened 1 month ago

kalebbanks commented 1 month ago

package version:

Hello all,

I am having some issues with projecting ensemble models into future climatic conditions. I think the source of the issue is that variables with extremely low variable important score are drastically influencing the projection.

I am working with a frog species native to Oklahoma, US and have selected 4 bioclim variables and 2 landscape variables. I have 272 presence points with an equal number of absence points.

cf_data<- biomod2::BIOMOD_FormatingData(resp.var = rep(1, nrow(cf_occ)) , expl.var = bioclim_cf_sub , resp.xy = cf_occ[, c('Longitude', 'Latitude')] , filter.raster = TRUE , resp.name = 'frog' , PA.nb.rep = 3 , PA.nb.absences = 272 , PA.strategy = 'random')

myBiomodModelOut <- BIOMOD_Modeling(bm.format = cf_data, models = c('MAXENT','SRE', 'ANN', 'FDA', 'GAM', 'GLM', 'GBM', 'MARS', 'RF'), CV.strategy = 'random', CV.nb.rep = 5, CV.perc = 0.7, OPT.strategy = 'bigboss', metric.eval = c('TSS','ROC'), var.import = 3, CV.do.full.models = FALSE)

From this I am left with 135 models, and I select models with TSS over .75 (n=6)

Information about the remaining models:

image

cf_ensemble_models <- biomod2::BIOMOD_EnsembleModeling( bm.mod = myBiomodModelOut, models.chosen = 'all', em.by = 'all', metric.select = c('TSS'), metric.select.thresh = 0.75, metric.eval = c('TSS', 'ROC'), em.algo = c('EMwmean'), var.import = 3 ) Ensemble model metric.eval cutoff sensitivity specificity calibration 1 TSS 494 94.853 80.755 0.756
2 ROC 427 98.529 77.107 0.924

Variable Importance Scores expl.var rand var.imp 1 bio_12 1 0.666353 2 bio_9 1 0.005643 3 bio_5 1 0.045998 4 bio_8 1 0.042818 5 p_prairie 1 0.100442 6 P_clay 1 0.012926 7 bio_12 2 0.573479 8 bio_9 2 0.006491 9 bio_5 2 0.047868 10 bio_8 2 0.043144 11 p_prairie 2 0.099518 12 P_clay 2 0.014944 13 bio_12 3 0.678557 14 bio_9 3 0.006760 15 bio_5 3 0.043273 16 bio_8 3 0.045576 17 p_prairie 3 0.104418 18 P_clay 3 0.015108

As you can see, bio_12 (annual rainfall) is my most important variable by far about 0.66, followed by percent prairie about 0.10, then bio_5 and bio 8 ) about 0.04, and lastly percent clay and bio_9 less than 0.01. Heres the projection: image

followed by the binary presence/absence prediction using the TSS threshold, 494. Red is suitable, yellow is unsuitable:

image

As a regional expert in this species, this looks accurate to me.

Then I go to project it into future climatic conditions. I got my conditions by averaging 3 GCMs for 245 ssp for 2070. I keep the two land use variables the exact same, and use these new future climatic conditions. I kept the rasterlayers order the same and the file names of the rasters are the same (I know this has been the source of error for a few people).

cf_models_ensemble_proj_current <- biomod2::BIOMOD_EnsembleForecasting( bm.em = cf_ensemble_models , proj.name = 'Future' , new.env = future_stack , models.chosen = 'crawfish.frog_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo' , metric.binary = 'all' , metric.filter = 'all' ) `

This is the projection: image

and this is the binary projection using the same TSS threshold as above:

image

I am very confused about this reduction in range size, mainly because the environmental variables (especially bio_12 which I feel is the most important for the species and has the highest variable importance score) are not very different in the future GCMs. Especially this different to reduce suitable habitat to almost zero. To try and figure out if one variable or another is affecting the projection, I create four new projections, leaving one climatic variable as current conditions for each projection.

If bio_8 is current conditions image If bio_5 is current conditions image If bio_12 is current conditions image if bio_9 is current conditions image

As you can see, when bio_5 is set at its current condition suitability goes up which suggests to me that this is the variable negatively affecting my projections. My questions is, why do variables that have such low variable importance scores affect the model so much? Is there any way to correct or adjust how the models are projected to take more consideration of the variable of importance score?

Thanks for any help or suggestions you can provide.

HeleneBlt commented 1 month ago

Hi Kaleb,

My first guess will be to look at the range of bio5 in the current and future environment. Could you share the output of show(bioclim_cf_sub) and show(future_stack) ? You can also look at the response curves of bio5 with bm_PlotResponseCurves.

responseCurves_current <- bm_PlotResponseCurves(bm.out = cf_ensemble_models)
responseCurves_future <- bm_PlotResponseCurves(bm.out = cf_ensemble_models, 
                                               new.env = future_stack)

It can help to understand what happened for the different variables.

I would also need to know the version of biomod2 you use 🙂

Thanks a lot. Hélène

kalebbanks commented 1 month ago

Thanks for the response.

Here are the response curves: Current climatic conditions: image

Future conditions: image

The differences between the two raster stacks: show(bioclim_cf_sub) class : RasterStack dimensions : 81, 206, 16686, 6 (nrow, ncol, ncell, nlayers) resolution : 0.04166667, 0.04166667 (x, y) extent : -103, -94.41667, 33.625, 37 (xmin, xmax, ymin, ymax) crs : +proj=longlat +datum=WGS84 +no_defs names : bio_12, bio_9, bio_5, bio_8, p_prairie, P_clay min values : 390.00000, 1.40600, 30.69600, 14.05333, 0.00000, 0.00000 max values : 1416.00000, 26.77133, 36.62000, 24.99733, 99.92204, 52.49200

show(future_stack) class : RasterStack dimensions : 81, 206, 16686, 6 (nrow, ncol, ncell, nlayers) resolution : 0.04166667, 0.04166667 (x, y) extent : -103, -94.41667, 33.625, 37 (xmin, xmax, ymin, ymax) crs : +proj=longlat +datum=WGS84 +no_defs names : bio_12, bio_9, bio_5, bio_8, p_prairie, P_clay min values : 377.666656, 2.566667, 34.000000, 16.733334, 0.000000, 0.000000 max values : 1447.00000, 27.13333, 39.73333, 28.43333, 99.92204, 52.49200

packageVersion("biomod2") [1] ‘4.2.5.2’

kalebbanks commented 1 month ago

In cases like this would it be beneficial to adjust the extent of my study area for the SDM? The western portion of the state is drastically different than the eastern half (more drier and hotter) and the frogs are not present. I wonder if that could be affecting how the model projects into future, hotter climates.

HeleneBlt commented 1 month ago

Hello !

I think the problem here comes more from bio12 with a response curve amplitude really reduced with the future projection. Indeed, reducing the extent can be a good solution: the models will be more adapted to "subtle" variations of bio12 and bio5.

Let me know if it helps! Hélène

kalebbanks commented 1 month ago

Hello,

Nope it is still not working properly. I reduced the extent, westward. Had to change some variables to avoid correlation. I used the bio_12 (annual rainfall), bio_9 (mean temp driest quarter), bio_5 (max temp of warmest month), bio_18 (precip of warmest quarter), and my land scape variables.

This is what the SDM looks like with reduced extent: image

again, this looks good to me.

However, when I project it to future conditions the suitability is the same places, but it is way reduced, and when you filter by the binary TSS threshold, there is almost no suitable habitat left: image

Here's the response curve of the current climate:
image

Here's the response curve of the future climate: image

Again the response curve shows way decreased suitability with almost the same range of precip. I am not sure what the issue is and why it keeps occurring but any help would be appreciated.

kalebbanks commented 1 month ago

And here is what the projection looks like when bio_5 (max temp of warmest month) is kept the same and all future variables are kept at their future projections:

image Which looks expected to me. The variable importance score of bio_5 is less than 0.05, which makes me think that the issue is how the algorithms are responding to extrapolation of this variable as it does change considerably (see above where i show different variable stacks). So I guess my question is, does anyone know how these algorithms extrapolates variables? and if there is different settings in biomod to control it?

HeleneBlt commented 1 month ago

Hi Kaleb,

Sorry for my late answer, I've been called on other projects!

Could you look at the response curves for the 6 selected models so we can see if one of the algorithms reacts badly to the extrapolation of the variables? You can have a look at how the different algorithms extrapolate variables in the documentation of each algorithm. You can after play with the different options of each algorithm with bm_ModelingOptions

Can you also look at the distribution of PA3 (you have 5/6 models with PA3)? You can use: plot(cf_data, calib.lines = get_calib_lines(myBiomodModelOut))

As the importance of bio5 is really low, have you tried to redo your modelling without this variable?

Hélène