biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
87 stars 22 forks source link

Error with multispecies wrapper from ENM 2020 course #227

Closed rpatin closed 1 year ago

rpatin commented 1 year ago

I was able to run the updated biomod2 4.2 tutorial with the single species (Gulo gulo) after:

  1. restarting R
  2. Opening the R script directly from the working directory folder (i.e., starting in the workdir rather than routing to it...which seemed to cause issues)
  3. Not having separate subfolders (e.g., the 3 subfolders from the 2020 tutorial)

With such, I was successfully able to run through the (updated) sample tutorial "main functions" (e.g., single model, projection, ensemble, range size).

However, I cannot get multi-species --via the wrapper code from 2020 -- to run. I get "NULL value passed as symbol address" errors whether I try coding verbatim from the ca. 2020 script (Larus dataset) or manually updating function nomenclature from v4.2 (attached .txt). ...I suspect I should open a new issue thread for that though? Unless there is an updated tutorial for multi-species wrappers somewhere?

Grateful for any assistance, -Brian

biomod_multspecies_error.txt

Originally posted by @brblais in https://github.com/biomodhub/biomod2/issues/226#issuecomment-1492754024

rpatin commented 1 year ago

Hi Brian, There are two issue with the code you shared:

  1. The SpatRaster were not named properly, you need to explicitly rename the layers:

    stk_current <- 
    rast(
    c(
      bio_1 =  "worldclim_EU_bio_1.tif",
      bio_12 = "worldclim_EU_bio_12.tif",
      bio_8 =  "worldclim_EU_bio_8.tif")
    )
    names(stk_current) <- c("bio_1","bio_8", "bio12")

    You did not see errors related to that as the code bugged before projecting to future environment.

  2. For yet unknown reason, when defining an external cluster with

    library(doParallel)
    cl <- makeCluster(4)
    doParallel::registerDoParallel(cl)

    biomod2 function do not (always) work properly. It is better (but also generally easier) to use the internal argument nb.cpu. This is what was causing the error NULL value passed as symbol address.

If you change both of those you can get the following code:

## load the required packages
#from 2020 tutorial
library(biomod2)
library(raster)
library(rasterVis)
library(gridExtra)
library(reshape2)

#from biomod2 v4.2
library(terra)
library(tidyterra)

## read data ----     #Note: moved all working data (occ, bioclim) to wd()
## species occurences data
data <- read.csv('Larus_occ.csv', stringsAsFactors = FALSE)
head(data)
table(data$species)
spp_to_model <- unique(data$species)[1:2]

## curent climatic variables
stk_current <- 
  rast(
    c(
      bio_1 =  "worldclim_EU_bio_1.tif",
      bio_12 = "worldclim_EU_bio_12.tif",
      bio_8 =  "worldclim_EU_bio_8.tif")
    )
names(stk_current) <- c("bio_1","bio_8", "bio12")
## 2050 climatic variables
stk_2050_BC_45 <- 
  rast(
    c(
      bio_1 =  "worldclim_EU_2050_BC45_bio_1.tif",
      bio_12 = "worldclim_EU_2050_BC45_bio_12.tif",
      bio_8 =  "worldclim_EU_2050_BC45_bio_8.tif"
    )
  )
names(stk_2050_BC_45) <- c("bio_1","bio_8", "bio12")

## 2070 climatic variables
stk_2070_BC_45 <- 
  rast(
    c(
      bio_1 =  "worldclim_EU_2070_BC45_bio_1.tif",
      bio_12 = "worldclim_EU_2070_BC45_bio_12.tif",
      bio_8 =  "worldclim_EU_2070_BC45_bio_8.tif"
    )
  )
names(stk_2070_BC_45) <- c("bio_1","bio_8", "bio12")

sp <- spp_to_model[1]
## build species modelling wrapper ----
biomod2_wrapper <- function(sp){
  cat("\n> species : ", sp)

  ## get occurrences points
  sp_dat <- data[data$species == sp, ]

  ## formating the data --------------------------------------
  sp_format <- 
    BIOMOD_FormatingData(
      resp.var = rep(1, nrow(sp_dat)), 
      expl.var = unwrap(stk_current),
      resp.xy = sp_dat[, c("long", "lat")],
      resp.name = sp,
      PA.strategy = "random", 
      PA.nb.rep = 2, 
      PA.nb.absences = 1000
    )

  ## print formatting summary
  sp_format

  ## save image of input data summary
  if(!exists(sp)) dir.create(sp)
  pdf(paste(sp, "/", sp ,"_data_formated.pdf", sep="" ))
  try(plot(sp_format))
  dev.off()

  ## define models options ---------------------------------------
  sp_opt <- BIOMOD_ModelingOptions()

  ## model species -------------------------------------
  sp_model <- BIOMOD_Modeling( 
    bm.format = sp_format, 
    models = c('GLM', 'FDA', 'RF'), 
    bm.options = sp_opt, 
    nb.rep = 2,
    data.split.perc = 80, 
    #weights = NULL, 
    var.import = 3, 
    metric.eval = c('TSS', 'ROC'),
    save.output = TRUE,
    #scale.models = FALSE,
    do.full.models = FALSE,
    modeling.id = "demo2",
    nb.cpu = 4
  )

  ## save some graphical outputs ---------------
  #### models scores
  pdf(paste0(sp, "/", sp , "_models_scores.pdf"))
  try(gg1 <- models_scores_graph(sp_model, metrics = c("TSS", "ROC"), by = 'models', plot = FALSE))
  try(gg2 <- models_scores_graph(sp_model, metrics = c("TSS", "ROC"), by = 'data_set', plot = FALSE))
  try(gg3 <- models_scores_graph(sp_model, metrics = c("TSS", "ROC"), by = 'cv_run', plot = FALSE))
  try(grid.arrange(gg1, gg2, gg3))
  dev.off()

  ## build ensemble models ---------------------------
  sp_ens_model <- 
    BIOMOD_EnsembleModeling(
      bm.mod = sp_model,
      models.chosen = 'all',
      em.by = 'all',
      metric.select = c('TSS'),
      metric.select.thresh = c(0.4),
      metric.eval = c('TSS','ROC'),
      em.algo = c('EMcv', 'EMci',  'EMca', 'EMwmean'),
      EMci.alpha = 0.05,
      EMwmean.decay = 'proportional',
      var.import = 3,
      nb.cpu = 4
    )

  ## do projections ---------------------------
  proj_scen <- c("current", "2050_BC_45", "2070_BC_45")

  for(scen in proj_scen){
    cat("\n> projections of ", scen)

    ## single model projections ------------------------
    sp_proj <- 
      BIOMOD_Projection(
        bm.mod = sp_model,
        new.env = get(paste0("stk_", scen)),
        proj.name = scen,
        models.chosen = 'all',
        metric.binary = "TSS",
        metric.filter = NULL,
        # compress = TRUE,
        build.clamping.mask = FALSE,
        do.stack = TRUE,
        output.format = ".img",
        nb.cpu = 4
      )

    ## ensemble model projections
    sp_ens_proj <- 
      BIOMOD_EnsembleForecasting(
        bm.em = sp_ens_model,
        bm.proj = sp_proj,
        binary.meth = "TSS",
        compress = TRUE,
        do.stack = TRUE,
        output.format = ".img",
        nb.cpu = 4
      )
  }

  return(paste0(sp," modelling completed !"))
}

################################################################
## launch the spiecies modelling wrapper over species list ----

# library(doParallel)
# cl <- makeCluster(4)
# doParallel::registerDoParallel(cl)
system.time(for (sp in spp_to_model){
  biomod2_wrapper(sp)
})

which should now work properly. Except for the models_scores_graph that need to be replaced by the new bm_PlotEvalMean or bm_PlotEvalBoxplot, you can look at the tutorial for the syntax.

Best, Rémi

brblais commented 1 year ago

Thanks again @rpatin Rémi,

Your help has resolved...I was able to overcome the errors/issues and generate data for the multi-species modeling.

If I wanted to assess range size/change over time (e.g., current, 2050, 2070) for each of the n species (ensemble output), it is most logical to insert respective code inside the biomod2_wrapper rather than outside of it separately, right?

Best, -Brian

rpatin commented 1 year ago

Hi Brian, Good that you could overcome the errors/issues :+1:

If you want to repeat the same analysis for all your species (e.g. range size assessment or change over time) it is indeed logical to insert the code inside the wrapper to be able to repeat it.

Best, Rémi