biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
89 stars 22 forks source link

Help with BIOMOD_4.2-6.1 - ensemble forecasting with different # of PA #542

Open amythor opened 2 days ago

amythor commented 2 days ago

Hey biomod2 team,

I am struggling a bit to get the ensemble forecasting down. I had originally run my models using the same PA approach for all algos and found the ensemble process easy as I used em.by = 'all', which left me with a single output map when projecting for current and future periods. However I am now trying to use a different number of PA's per algo and am having trouble getting just one final output map when using the ensemble forecasting function. I understand I need to combine by PA+run first but is there a way to combine these again into one, and also if I do this is the code also first filtering models out that do not meet the threshold of TSS > 0.7? Or are all models being included (something I do not want)?

When I use em.by = 'all' i get this message in the console (and then it only uses RF and GBM for the ensemble):

! all models available will be included in ensemble.modeling

Evaluation & Weighting methods summary : TSS over 0.7

!!! Removed models using the Full dataset as ensemble models cannot merge repetition dataset (RUN1, RUN2, ...) with Full dataset unless em.by = 'PA+run'.

mergedData_mergedRun_mergedAlgo ensemble modeling ! Additional projection required for ensemble models merging several pseudo-absence dataset... -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Do Single Models Projection -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Projecting PoweshiekSkipperling2_PA2_RUN1_RF ... Projecting PoweshiekSkipperling2_PA2_RUN1_GBM ... Projecting PoweshiekSkipperling2_PA2_RUN2_RF ... Projecting PoweshiekSkipperling2_PA2_RUN2_GBM ... Projecting PoweshiekSkipperling2_PA2_RUN3_GBM ...

..and so on...

When I use em.by = 'PA+run' i get this message in the console: ! all models available will be included in ensemble.modeling

Evaluation & Weighting methods summary : TSS over 0.7

 !! Ensemble Model PA1_allRun_mergedAlgo PA2_RUN7_mergedAlgo PA2_RUN11_mergedAlgo PA2_allRun_mergedAlgo PA3_RUN7_mergedAlgo PA3_allRun_mergedAlgo PA4_allRun_mergedAlgo PA5_RUN13_mergedAlgo PA5_allRun_mergedAlgo selected by TSS have no model selected and will fail.
 !! Ensemble Model PA2_RUN3_mergedAlgo PA2_RUN8_mergedAlgo PA3_RUN1_mergedAlgo PA3_RUN15_mergedAlgo PA5_RUN8_mergedAlgo PA5_RUN12_mergedAlgo selected by TSS have only one single model selected.
 !! Please make sure this is intended or review your selection metrics and threshold.

PA1_RUN1_mergedAlgo ensemble modeling Mean of probabilities by TSS ... Evaluating Model stuff... Evaluating Predictor Contributions...

...And so on until....

PA1_allRun_mergedAlgo ensemble modeling ! No models kept due to threshold filtering... Ensemble Modeling will fail! ! Note : PoweshiekSkipperling2_EMmeanByTSS_PA1_allRun_mergedAlgo failed!

PA2_RUN1_mergedAlgo ensemble modeling Mean of probabilities by TSS ... Evaluating Model stuff... Evaluating Predictor Contributions...

Any help is much appreciated and please let me know if you require further code or output information. I tried to see if there was another question to this effect but could not find one. If this has already been addressed, please feel free to just link that github issue page.

Thank you in advance,

Amy

See code below:

format data

POSK_formatted <- BIOMOD_FormatingData( resp.name = "Species2", resp.var = POSK_FILTERED2, expl.var = var_det_clipped, dir.name = "Species1/Outputs_PowSki", resp.xy = POSK_projected[, c('X', 'Y')], na.rm = TRUE, filter.raster = TRUE, PA.nb.rep = 5, PA.nb.absences = c(10000, 1000, 1000, 1000, 1000), PA.strategy = 'disk', PA.dist.min = 5000, PA.dist.max = 100000 )

Perform cross-validation with both presences and absences

table_cv_POSK <- bm_CrossValidation( bm.format = POSK_formatted, # Your formatted biomod2 data strategy = "kfold", # Cross-validation strategy nb.rep = 3, # Number of repetitions k = 5, # Number of folds (0 for random cross-validation) do.full.models = TRUE # Whether to build full models )

calib_summary_POSK <- summary(POSK_formatted, calib.lines = table_cv_POSK) %>% filter(dataset == "calibration")

make model

POSK_ModelOutput <- BIOMOD_Modeling( bm.format = POSK_formatted, modeling.id = "AllModels", models = c("GAM", "GBM", "GLM", "MAXENT", "RF"), models.pa = list(RF = c("PA2", "PA3", "PA4", "PA5"), GBM = c("PA2", "PA3", "PA4", "PA5"), GLM = "PA1", GAM = "PA1", MAXENT = "PA1"), CV.strategy = "user.defined", CV.user.table = table_cv_POSK, CV.do.full.models = TRUE, OPT.user = model_parameters, OPT.strategy = "user.defined", weights = NULL, prevalence = NULL, metric.eval = c("KAPPA", "TSS", "ROC", "BOYCE"), var.import = 5, scale.models = FALSE, nb.cpu = 1, seed.val = 42, do.progress = TRUE )

Ensemble

POSK_Ensemble <- BIOMOD_EnsembleModeling( bm.mod = POSK_ModelOutput, models.chosen = 'all', em.by = "PA+run", em.algo = "EMmean", metric.select = "TSS", metric.select.thresh = c(0.7), metric.select.table = NULL, metric.select.dataset = 'validation', metric.eval = c("KAPPA", "TSS", "ROC", "BOYCE"), var.import = 5, EMci.alpha = 0.05, EMwmean.decay = "proportional", nb.cpu = 1, seed.val = NULL, do.progress = TRUE, )

2011-2040 SSP1

POSK_Proj1 <- BIOMOD_Projection(bm.mod = POSK_ModelOutput, proj.name = 'SSP1_2011-2040', new.env = proj1_clipped, models.chosen = 'all', metric.binary = 'all', metric.filter = 'all', build.clamping.mask = TRUE)

Project ensmble

Project ensemble models (building single projections) - Projecting current?

POSK_Ensemble_Proj <- BIOMOD_EnsembleForecasting(bm.em = POSK_Ensemble, new.env = var_det_clipped, models.chosen = 'all', metric.binary = 'all', metric.filter = 'all', proj.name = 'CurrentEM_SSP1_2011_2040') plot(POSK_Ensemble_Proj)

Project ensemble models (from single projections) - Projecting future?

POSK_Ensemble_Proj1_Future <- BIOMOD_EnsembleForecasting(bm.em = POSK_Ensemble, bm.proj = POSK_Proj1, models.chosen = 'all', metric.binary = 'all', metric.filter = 'all', proj.name = 'Ensemble_Proj_SSP1_2011_2040') plot(POSK_Ensemble_Proj1_Future)

HeleneBlt commented 2 days ago

Hi Amy,

Thanks a lot for this very detailed issue !

If I understand well, you want to have an ensemble model for each pseudo-absences dataset. In this case, you can use em.by = "PA". The models will be first filtered with the metric and the threshold you gave (TSS > 0.7 for you ) and after will be regrouped by PA datasets.

🖼️ If you want to have an illustration of biomod2 difference merging possibilities, you can have a look at this presentation (2.b ensemble models slide 33) 👀

If it is not what you want, feel free to ask ! Hélène

amythor commented 1 day ago

Hi Hélène,

Thank you so much for your prompt response, I super appreciate the biomod2 team's work on github and elsewhere, has made this such an easier process.

I guess my main question is if I can still merge using em.by = 'all' if I assigned different PA approaches to different algorithms. I would like my end product to be one final ensemble model (and ensemble forecast to project to future conditions) and I am struggling to do that after assigning different PA numbers to different algorithms (see my formatting and modelling functions).

Is this possible to combine all at once and have one final output using this PA approach? After working on it more this morning I think it could be I am misunderstanding the output in the console but it looked to me like it was only using the RF and GBM algos for which I assigned multiple PA datasets, leaving out the GLM, GAM, and MAXENT for which I assigned a singular PA approach?

What function can I use to see which models made it into the ensemble after the filtering process? This will probably help answer my question, I think I may just be confused because of what I was seeing in the console while it was processing.

Thanks again, Amy

HeleneBlt commented 1 day ago

Hello Amy,

First, to see the models which made it into the filtering process, you can use the function get_kept_models(POSK_Ensemble).

It is totally possible to use em.by = "all" with different PA datasets for different algorithms. 💪 In your case, you can first look at the kept models with the function above. If you don't have the GLM, GAM and MAXENT, my first guess will be that the filter is a little bit too selective and rejects the GLM, GAM and MAXENT models. You can have a look at the results with get_evaluations(POSK_ModelOutput) to see how to adjust your filter. 📑

Let's me know if it corresponds to your case, Hélène