biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
87 stars 22 forks source link

Obtaining binary (0-1) maps from ensemble models #193

Closed Lisa-Tedeschi closed 1 year ago

Lisa-Tedeschi commented 1 year ago

Hi, I am trying to produce binary maps from an ensemble model. This is my code:

Ensemble modeling

myBiomodEM <- BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut,
                                      models.chosen = "all",
                                      em.by = "all",
                                      metric.select = c("TSS"), 
                                      metric.select.thresh = c(0.70), 
                                      metric.eval = c("KAPPA", "TSS", "ROC"), 
                                      var.import = 3,
                                      prob.mean.weight = T) 

Project the ensemble models

myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = myBiomodEM,
                                             bm.proj = myBiomodProj,
                                             models.chosen = "all",
                                             metric.binary = c("KAPPA", "TSS", "ROC"),
                                             metric.filter = c("KAPPA", "TSS", "ROC"))

Load and stack the binary projections

current_EMproj <- raster::stack(c(ca = "./SDM/Models/Ammotragus.lervia/proj_Current/individual_projections/Ammotragus.lervia_EMmeanByTSS_mergedAlgo_mergedRun_mergedData.grd",
                                         wm = "./SDM/Models/Ammotragus.lervia/proj_Current/individual_projections/Ammotragus.lervia_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData.grd"))

I have few questions: 1) current_EMproj values range is 6-991. Shouldn't it be 0-1? If I load the single models' binary projection (current_binary_proj <- raster::stack("./SDM/Models/Ammotragus.lervia/proj_Current/proj_Current_Ammotragus.lervia_TSSbin.grd"), based on TSS' values, the values (of each raster) are 0-1.
2) I tried to produce binary maps with: current_binary_EMproj <- biomod2::bm_BinaryTransformation(data = current_EMproj, threshold = 700, do.filtering = F) However, I am using a pre-defined threshold (700), while I would like to use the optimal threshold for my data. I checked both biomo2d::bm_FindOptimStat and PresenceAbsence::optimal.thresholds, but in both cases, I couldn't understand the input data required. For instance, biomod2::bm_FindOptimStat requires: obs = a vector of observed values (binary, 0 or 1) fit = a vector of fitted values (continuous) Is obs what is usually named myResp (in my case a vector of presences, only 1)? But if fit represent values(current_EMproj[[1]]) (which I am not sure about), clearly the length will be different, as current_EMproj[[1]] is a huge raster.

I think I am missing some basic understanding of the input data... This is my first question here, so apologies in advance for any error.

MayaGueguen commented 1 year ago

Hello there,

Here are some answers to your questions :

  1. As it takes less place in memory to save integer rather than decimal values, biomod2 predictions are automatically saved between 0 and 1000 instead of 0 and 1. There is an option in BIOMOD_Projection and BIOMOD_EnsembleForecasting functions that allows you to change that by setting on_0_1000 = FALSE. TSSbin files are between 0 and 1 because it is probability values (between 0 and 1000, or 0 and 1) that have been transformed into 0 or 1 finding the best threshold optimizing TSS.

  2. Binary maps are automatically produced (_using internally the bm_BinaryTransformation function_) when using the metric.binary parameter within the BIOMOD_Projection and BIOMOD_EnsembleForecasting functions. If you set for example metric.binary = 'TSS', then it will find the optimal threshold to transform probabilities into 0/1 while maximising the TSS.

So my guess is that you already have what you want, which is the ./SDM/Models/Ammotragus.lervia/proj_Current/proj_Current_Ammotragus.lervia_TSSbin.grd file.

I hope it is clearer that way, Maya

Lisa-Tedeschi commented 1 year ago

Hello there,

Here are some answers to your questions :

  1. As it takes less place in memory to save integer rather than decimal values, biomod2 predictions are automatically saved between 0 and 1000 instead of 0 and 1. There is an option in BIOMOD_Projection and BIOMOD_EnsembleForecasting functions that allows you to change that by setting on_0_1000 = FALSE. TSSbin files are between 0 and 1 because it is probability values (between 0 and 1000, or 0 and 1) that have been transformed into 0 or 1 finding the best threshold optimizing TSS.
  2. Binary maps are automatically produced (_using internally the bm_BinaryTransformation function_) when using the metric.binary parameter within the BIOMOD_Projection and BIOMOD_EnsembleForecasting functions. If you set for example metric.binary = 'TSS', then it will find the optimal threshold to transform probabilities into 0/1 while maximising the TSS.

So my guess is that you already have what you want, which is the ./SDM/Models/Ammotragus.lervia/proj_Current/proj_Current_Ammotragus.lervia_TSSbin.grd file.

I hope it is clearer that way, Maya

Dear Maya, thanks for your answer! I tried loading ./SDM/Models/Ammotragus.lervia/proj_Current/proj_Current_Ammotragus.lervia_TSSbin.grd, which indeed is exactly what I need, but aren't these the binary predictions of all my models? I would need a single binary prediction of my ensemble model (like current_binary_proj[[1]]), which would be the result of BIOMOD_EnsembleForecasting(), where I have set metric.binary = c("KAPPA", "TSS", "ROC") (same as in BIOMOD_Projection() for the single models). Nevertheless, the two outputs I get with BIOMOD_EnsembleForecasting() (which I loaded as current_EMproj) do not have only 0 or 1 values. Now that you explained why the values range between 0 and 1000 and are not either 0 or 1, I guess I can manually set a threshold and assign a 0 to each cell below the threshold, and a 1 to each cell above the threshold. But it's still unclear to me why for the single models I get what I need (while for the ensemble model I don't), and how to feed biomo2d::bm_FindOptimStat or PresenceAbsence::optimal.thresholds (in case I need to compute an optimal threshold). Probably I'm just overlooking something extremely simple or clear here... Thanks.

MayaGueguen commented 1 year ago

Hello,

I'm glad that things are getting clearer :sun_with_face: I see that you did not set a proj.name argument when calling BIOMOD_EnsembleForecasting function. In this case, the outputs should have been put into your proj_Current folder. Could you try running the Ensemble Forecasting one more time, setting for example proj.name = 'CurrentEM' ? Just to see if like that you do get a file containing a pattern like _ensembleTSSbin in its name.

Maya

Lisa-Tedeschi commented 1 year ago

Hello,

I'm glad that things are getting clearer 🌞 I see that you did not set a proj.name argument when calling BIOMOD_EnsembleForecasting function. In this case, the outputs should have been put into your proj_Current folder. Could you try running the Ensemble Forecasting one more time, setting for example proj.name = 'CurrentEM' ? Just to see if like that you do get a file containing a pattern like _ensembleTSSbin in its name.

Maya

Hi Maya, I have re-run BIOMOD_EnsembleForecasting() with the argument proj.name = "CurrentEM", and I get the same outputs as before (i.e., a proj_CurrentEM_Ammotragus.lervia_ensemble.gri and .grd, an Ammotragus.lervia.CurrentEM.ensemble.projection.out, and a individual_projections folder with the same files as before - in the object I called current_EMproj). There, I get only two (one for the mean and one for the weighted mean) ByTSS files (instead of a KAPPA and ROC as well, as you explained here https://r-forge.r-project.org/forum/forum.php?max_rows=50&style=nested&offset=23&forum_id=995&group_id=302 ) because I put metric.select = c("TSS") in BIOMOD_EnsembleModeling(), right?

However, when running again BIOMOD_EnsembleForecasting(), I get a warning (maybe I overlooked that in the first run, but I don't remember), which I guess it's the reason why no _ensembleTSSbin is produced: Warning messages: 1: In .BIOMOD_EnsembleForecasting.check.args(bm.em, bm.proj, proj.name, : KAPPA, TSS, ROC Binary Transformation were switched off because no corresponding evaluation method found 2: In .BIOMOD_EnsembleForecasting.check.args(bm.em, bm.proj, proj.name, : KAPPA, TSS, ROC Filtered Transformation were switched off because no corresponding evaluation method found As said, I had those arguments in BIOMOD_EnsembleModeling() before: metric.select = c("TSS"), metric.select.thresh = c(0.70), metric.eval = c("KAPPA", "TSS", "ROC") so at least an _ensembleTSSbin should have been produced, right? Thanks.

MayaGueguen commented 1 year ago

Hello,

Could you tell me which versions of R and biomod2 package you are currently using please ? If you do not have the 4.2-3, could you try and install it from github ?

library(devtools)
devtools::install_github('biomodhub/biomod2')

If it does not solve it, could you try once more running the BIOMOD_EnsembleForecasting function, but with the argument do.stack = FALSE ?

Maya

Lisa-Tedeschi commented 1 year ago

Hello,

Could you tell me which versions of R and biomod2 package you are currently using please ? If you do not have the 4.2-3, could you try and install it from github ?

library(devtools)
devtools::install_github('biomodhub/biomod2')

If it does not solve it, could you try once more running the BIOMOD_EnsembleForecasting function, but with the argument do.stack = FALSE ?

Maya

Hi Maya, here it is:

R version 4.2.2 (2022-10-31 ucrt)
biomod2_4.1-2  

I will try to update biomod2 and also use do.stack = FALSE. In the meanwhile, thanks a lot!

Lisa-Tedeschi commented 1 year ago

Hello,

Could you tell me which versions of R and biomod2 package you are currently using please ? If you do not have the 4.2-3, could you try and install it from github ?

library(devtools)
devtools::install_github('biomodhub/biomod2')

If it does not solve it, could you try once more running the BIOMOD_EnsembleForecasting function, but with the argument do.stack = FALSE ?

Maya

Hi Maya, just an update: I couldn't install the github version of biomod2, but I switched to an older version (3.3.7) on a remote Linux machine and, with the same data, the binaries are produced. I will try later on with the most recent version tho. In the meanwhile, thanks a lot for your time!