biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
83 stars 22 forks source link

trouble to generate single and ensemble model binary files (current/future) to analyze range size difference #145

Closed MichaelMoto closed 1 year ago

MichaelMoto commented 1 year ago

Hello, first of all, thanks a lot for the development of the great package and for any help!

I have the problem to generate? load? or find? the binary files generated for the single and ensemble models and later for the ensemble forecasts to analyse range size differences. Applying the former biomod version 3.3-7, single (for each run and applied algorithm) and ensemble binary files (for the evaluation parameters TSS, ROC and so on) were also saved in the folder "individual projections".

I applied the code from "chen", who had a similar problem, please see (https://github.com/biomodhub/biomod2/issues/134) with the GuloGulo example/ data used in the biomod2 manual. Still, I do not receive single and ensemble binary files in the folder "individual_projections" (e.g. GuloGulo_AllData_RUN1_RF_TSSbin.grd, GuloGulo_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.grd).

---------------------------------------------------------

In addition, I get the following warning message:

Warning message: In .BIOMOD_EnsembleForecasting.check.args(bm.em, bm.proj, proj.name, : TSS Binary Transformation were switched off because no corresponding evaluation method found

To avoid the warning message, I need to change (metric.binary = 'TSS' # to # metric.binary = 'all')

---------------------------------------------------------

Best regards, Michael

I am using R version 4.2.1 (2022-06-23 ucrt) and biomod version 4.1.2.

The code is:

Load species occurrences (6 species available)

myFile <- system.file('external/species/mammals_table.csv', package = 'biomod2') DataSpecies <- read.csv(myFile, row.names = 1) head(DataSpecies)

Select the name of the studied species

myRespName <- 'GuloGulo'

Get corresponding presence/absence data

myResp <- as.numeric(DataSpecies[, myRespName])

Get corresponding XY coordinates

myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]

Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)

myFiles <- paste0('external/bioclim/current/bio', c[(3, 4, 7, 11](callto:(3, 4, 7, 11), 12), '.grd') myExpl <- raster::stack(system.file(myFiles, package = 'biomod2'))

---------------------------------------------------------------

Format Data with true absences

myBiomodData <- BIOMOD_FormatingData(resp.var = myResp, expl.var = myExpl, resp.xy = myRespXY, resp.name = myRespName)

Create default modeling options

myBiomodOptions <- BIOMOD_ModelingOptions()

Model single models

myBiomodModelOut <- BIOMOD_Modeling( bm.format = myBiomodData, bm.options = myBiomodOptions, modeling.id = 'AllModels', models = c('RF', 'GLM'), nb.rep = 2, data.split.perc = 80, var.import = 3, metric.eval = c('TSS','ROC'), do.full.models = FALSE, seed.val = 123)

Model ensemble models

myBiomodEM <- BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut, models.chosen = 'all', em.by = 'all', metric.select = c('TSS'), metric.select.thresh = c(0.7), metric.eval = c('TSS', 'ROC'), var.import = 3, prob.mean = F, prob.median = F, prob.cv = TRUE, prob.ci = F, committee.averaging = TRUE, prob.mean.weight = TRUE)

Project single models

myBiomodProj <- BIOMOD_Projection( bm.mod = myBiomodModelOut, proj.name = 'Current', new.env = myExpl, models.chosen = 'all', metric.binary = 'TSS', output.format = '.grd', do.stack = T, binary = TRUE)

Project ensemble models (from single projections)

myBiomodEMProj <- BIOMOD_EnsembleForecasting( bm.em = myBiomodEM, bm.proj = myBiomodProj, output.format = '.grd', models.chosen = 'all', metric.binary = 'TSS', do.stack = T, binary = TRUE)

rpatin commented 1 year ago

Hello Michael, Thank you for reporting on github :pray:

Unfortunately biomod2 version 4.1-2 that was released on CRAN had several issues with binary transformation. Most of them were however corrected with the current github version (4.2-1) so if you update your package with devtools::install_github('biomodhub/biomod2') your issue should hopefully solve itself.

Note that that loading the GuloGulo dataset with version > 4.2-0 has slightly changed:

data("DataSpecies")
data("bioclim_current")
myExpl <- terra::rast(bioclim_current)

If your problem is not solved, feel free to let us know of the additional issues encountered. Best regards, Rémi

MichaelMoto commented 1 year ago

Hello Rémi,

thanks a lot for your help and advice! I updated biomod2 and again, ran the code above. It worked, after changing "do.stack = T to do.stack = F", I received binary files for my single runs and the ensemble models, although a new error occurred, after running the last part of the script above.

Project ensemble models (from single projections)

myBiomodEMProj <- BIOMOD_EnsembleForecasting( bm.em = myBiomodEM,

  • bm.proj = myBiomodProj,
  • output.format = '.grd',
  • models.chosen = 'all',
  • metric.binary = 'all',
  • do.stack = FALSE)

-=-=-=-=-=-=-=-=-=-=-=-=-=-= Do Ensemble Models Projection -=-=-=-=-=-=-=-=-=-=-=-=-=-=

Creating suitable Workdir...

> Projecting GuloGulo_EMcvByTSS_mergedAlgo_mergedRun_mergedData ...
    Writing projection on hard drive...
> Projecting GuloGulo_EMcaByTSS_mergedAlgo_mergedRun_mergedData ...
    Writing projection on hard drive...
> Projecting GuloGulo_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData ...
    Writing projection on hard drive...

> Building ROC binaries
> Building TSS binaries

Error in (function (cl, name, valueClass) : assignment of an object of class “SpatRaster” is not valid for @‘val’ in an object of class “BIOMOD.stored.SpatRaster”; is(value, "PackedSpatRaster") is not TRUE

Do you have experience with this kind of error?

Instead, running the following code and generate a new folder only for the ensemble models, no error occurs.

Project ensemble models (building single projections)

myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = myBiomodEM,

  • proj.name = 'CurrentEM',
  • new.env = myExpl,
  • models.chosen = 'all',
  • metric.binary = 'all',
  • output.format = '.grd',
  • do.stack = FALSE)

Again, thanks a lot for any help and advice!

Best regards, Michael

rpatin commented 1 year ago

Hello Michael, I could reproduce the error and it should be fixed by now if you update to current github version. Sorry for the inconvenience and please let me know if that does not solve your issue. Best regards, Rémi

MichaelMoto commented 1 year ago

Hello Rémi,

again, thanks a lot a for your support! After the update, the Gulo Gulo example (script see above) runs smoothly, now I will test my own data, hopefully it is the same. Sorry, but I have two more questions: First, after running the following part of the script, I receive the following warning message, is it a serious warning?

Model single models

myBiomodModelOut <- BIOMOD_Modeling( bm.format = myBiomodData,

  • bm.options = myBiomodOptions,
  • modeling.id = 'AllModels',
  • models = c('RF', 'GLM'),
  • nb.rep = 2,
  • data.split.perc = 80,
  • var.import = 3,
  • metric.eval = c('TSS','ROC'),
  • do.full.models = FALSE,
  • seed.val = 123)

Warning message: executing %dopar% sequentially: no parallel backend registered

Second: I never applied "seed.val" before. OK, it is optional, but what is the advice to use it or not, and what happens, if I change the number for the seed value?

Thanks a lot and best regards, Michael

rpatin commented 1 year ago

Hello Michael,

  1. No worry, the warning

    Warning message:
    executing %dopar% sequentially: no parallel backend registered

    is totally harmless, it just means that parallelization was not activated. As you did not ask for parallelization you can safely ignore it.

  2. seed.val is an optional argument that allow setting a seed for the random values that may be sampled. In other words if you use seed.val before sampling the pseudo-absences or running the model you should be able to reproduce your results later by using the same data and setting the same seed.val. Not setting it is also fine but you will no longer be able to reproduce the results.

If you change the number then all random operation (sampling numbers, etc...) will be slightly different, although you should end up with highly similar (but not identical) results.

Let me know if something is still unclear.

Best regards, Rémi