biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
85 stars 22 forks source link

biomod2 with categorical variables #255

Closed Bellerophon92 closed 1 year ago

Bellerophon92 commented 1 year ago

Hi all,

I am quite new to biomod2 I want to create a model with several variables for one species. My variables are all spatial data (tif raster files). I have only presence data for my species occ.

Here is my code:

#occ data 
sp

# Select the name of the studied species
myRespName <- 'species'

# Get corresponding presence/absence data
myResp <- as.numeric(sp[, myRespName])

# Get corresponding XY coordinates
myRespXY <- sp[, c('lon', 'lat')]

#variables
#bioclim layers from worldclim

bio
#categroical raster layers

cat1
cat2

So now to my question how do I tell the BIOMOD_FormatingData function to use my categorical variables as such?

I tried to set the rasters to categ with as.factor() and then stack it with bio, but I am not too sure if that is correct. And if i look into my resp curves the variables cat1/cat2 seem to be more like continuous var

cat1<- as.factor(cat1)
cat2<- as.factor(cat2)

bio<- stack(bio, cat1, cat2)
#here is the code
data<-BIOMOD_FormatingData(resp.var = myResp,
  resp.xy = myRespXY,
  expl.var = bio,
  resp.name = myRespName,
  PA.nb.rep = 2,
  PA.nb.absences = 100,
  PA.strategy ='random')

#define model parameters
model_opt <-BIOMOD_ModelingOptions()

# run the models

models <- BIOMOD_Modeling(bm.format = data,
  modeling.id = 'AllModels',
  models = c("RF"),
  bm.options = model_opt, 
  nb.rep = 10, 
  data.split.perc = 80, 
  var.import = 3, 
  do.full.models = TRUE, 
  metric.eval = c('TSS','ROC'),
  seed.val = 42)

Has anybody experience with that?

Cheers Michael

MayaGueguen commented 1 year ago

Hello Michael,

Thank you for trying biomod2 :pray:

To use categorical variables, you need to transform them as such by yourself before giving them to BIOMOD_FormatingData.

:exclamation: please be careful while naming your object in R : you should not name them with existing function names _Here you named your output of BIOMOD_FormatingData function as data but this an existing function name._ I renamed it as myBiomodData below.

You can check that the transformation has been correctly taken into account at 2 steps :

Hope it helps, Maya

Bellerophon92 commented 1 year ago

Thank you for your answer, Maya.

Both cat1 and cat2 are TRUE, when I check them with is.factor() in my stack object.

This is my output with show(myBiomodData)

dir.name = .

sp.name = species

 53 presences,  0 true absences, and  193 undefined points in dataset

 11 explanatory variables
     bio_1            bio_3           bio_7           bio_8             bio_9             bio_15          bio_18      NH4
 Min.   : 4.958   Min.   :28.09   Min.   :22.10   Min.   :-0.4333   Min.   :-0.7833   Min.   :10.56   Min.   :184.0   Min.   :0.6321  
 1st Qu.: 6.997   1st Qu.:31.75   1st Qu.:25.30   1st Qu.:12.6083   1st Qu.: 0.8333   1st Qu.:13.63   1st Qu.:247.0   1st Qu.:1.4666  
 Median : 7.923   Median :32.96   Median :25.80   Median :14.7167   Median : 3.6750   Median :16.88   Median :292.5   Median :2.0326  
 Mean   : 7.952   Mean   :33.06   Mean   :25.63   Mean   :11.8301   Mean   : 5.1633   Mean   :18.40   Mean   :296.7   Mean   :2.3137  
 3rd Qu.: 8.989   3rd Qu.:34.30   3rd Qu.:26.30   3rd Qu.:16.1167   3rd Qu.:10.3500   3rd Qu.:22.13   3rd Qu.:329.8   3rd Qu.:2.8429  
 Max.   :10.933   Max.   :37.85   Max.   :27.80   Max.   :18.2000   Max.   :16.5000   Max.   :33.93   Max.   :492.0   Max.   :7.4604  
 N                   cat1                  cat2
 Min.   : 9.60   Min.   : 1.000    Min.   :1.000      
 1st Qu.:14.20   1st Qu.: 2.000    1st Qu.:1.000      
 Median :16.05   Median : 4.000    Median :1.000      
 Mean   :16.67   Mean   : 3.935    Mean   :1.488      
 3rd Qu.:18.27   3rd Qu.: 6.000    3rd Qu.:2.000      
 Max.   :35.60   Max.   :14.000    Max.   :2.000

I can't see that the categoricals have levels in the object. What could the problem be?

MayaGueguen commented 1 year ago

Could you please copy / paste all your code from the beginning till the show(myBiomodData) command ? :pray:

Bellerophon92 commented 1 year ago
library(biomod2)
library(raster)
library(terra)
library(tidyterra)
library(ggtext)
library(usdm)

## occ data
sp_bw <- read.csv("species/occurence_data/occ.csv")
sp_bw <- rename(sp_bw, "lon" = "x")
sp_bw <- rename(sp_bw, "lat" = "y")
sp_bw <- rename(sp_bw, "id" = "ï..id")

sp <- sp_bw
sp$species <- 1

summary(sp)

# Select the name of the studied species
myRespName <- 'species'

# Get corresponding presence/absence data
myResp <- as.numeric(sp[, myRespName])

# Get corresponding XY coordinates
myRespXY <- sp[, c('lon', 'lat')]

shape <- shapefile("BaW_shp/BaW.shp")

# load in layers/variables
bio1 <- raster("wc0.5/bio_bawu/bio_1.tif")
bio2 <- raster("wc0.5/bio_bawu/bio_2.tif")
bio3 <- raster("wc0.5/bio_bawu/bio_3.tif")
bio4 <- raster("wc0.5/bio_bawu/bio_4.tif")
bio5 <- raster("wc0.5/bio_bawu/bio_5.tif")
bio6 <- raster("wc0.5/bio_bawu/bio_6.tif")
bio7 <- raster("wc0.5/bio_bawu/bio_7.tif")
bio8 <- raster("wc0.5/bio_bawu/bio_8.tif")
bio9 <- raster("wc0.5/bio_bawu/bio_9.tif")
bio10 <- raster("wc0.5/bio_bawu/bio_10.tif")
bio11 <- raster("wc0.5/bio_bawu/bio_11.tif")
bio12 <- raster("wc0.5/bio_bawu/bio_12.tif")
bio13 <- raster("wc0.5/bio_bawu/bio_13.tif")
bio14 <- raster("wc0.5/bio_bawu/bio_14.tif")
bio15 <- raster("wc0.5/bio_bawu/bio_15.tif")
bio16 <- raster("wc0.5/bio_bawu/bio_16.tif")
bio17 <- raster("wc0.5/bio_bawu/bio_17.tif")
bio18 <- raster("wc0.5/bio_bawu/bio_18.tif")
bio19 <- raster("wc0.5/bio_bawu/bio_19.tif")

bio <- stack(bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio8, bio9, bio10,
             bio11, bio12, bio13, bio14, bio15, bio16, bio17, bio18, bio19)

remove(bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio8, bio9, bio10,
       bio11, bio12, bio13, bio14, bio15, bio16, bio17, bio18, bio19)

#crop layers into same extent

bio_bw <- raster::crop(bio,shape)
bio_bw <- raster::mask(bio_bw, shape) 

#add other variables
#-------------------------------------------
NH3 <- raster("species/layers/NH3_value_crs84.tif")
N<- raster("species/layers/N_crs84.tif")
cat1 <- raster("cat1.tif")
cat2 <- raster("cat2.tif")

rs1 <- raster::crop(bio_bw,shape)
rs1 <- mask(x= rs1, mask=shape)

rs2 <- raster::crop(NH3,shape)
rs2 <- mask(x= rs2, mask=shape)

rs3 <- raster::crop(N,shape)
rs3 <- mask(x= rs3, mask=shape)

rs4 <- raster::crop(cat1,shape)
rs4 <- mask(x= rs4, mask=shape)

rs5 <- raster::crop(cat2shape)
rs5 <- mask(x= rs5, mask=shape)

#Conversion of rasters
NH3 <- resample(rs2, rs1, method='ngb')
N <- resample(rs3, rs1, method='ngb')
cat1 <- resample(rs4, rs1, method='ngb')
cat2 <- resample(rs5, rs1, method='ngb')

#set categorical variables
cat1 <-as.factor(cat1)
cat2 <-as.factor(cat2)

#stack all layers to raster-brick
biom <- stack(rs1, NH3, N)

v_1 <- vifstep(biom)
v_1

biom_1 <- exclude(biom, v_1)

biom_1.1 <- stack(biom_1, cat1, cat2)

is.factor(biom_1.1$cat1)
is.factor(biom_1.1$cat2)
is.factor(biom_1.1$bio_1)

# format model data
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
  resp.xy = myRespXY,
  expl.var = biom_1.1,
  resp.name = myRespName,
  PA.nb.rep = 2,
  PA.nb.absences = 100,
  PA.strategy ='random')

show(myBiomodData)
MayaGueguen commented 1 year ago

Thank you Michael for your code :pray:

My mistake, the print of rasterStack summary with several variables is not clearly explicit for categorical variables. However, if your raster layer is returning TRUE when calling the is.factor function, it should be fine. And you should see it when running the BIOMOD_FormatingData function when pseudo-absences are selected :

> myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
  resp.xy = myRespXY,
  expl.var = stk,
  resp.name = myRespName,
  PA.nb.rep = 2,
  PA.nb.absences = 100,
  PA.strategy ='random')

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= toto Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

      ! No data has been set aside for modeling evaluation
      ! No data has been set aside for modeling evaluation

Checking Pseudo-absence selection arguments...

      ! No data has been set aside for modeling evaluation
   > random pseudo absences selection
   > Pseudo absences are selected in explanatory variables
    > fact.level for HF :    1:0    2:1 3:2 4:3 5:4 6:5 7:6 8:7 9:8 10:9    11:10
     - according to mask.out levels 0 1 3 4 have already been sampled
     - levels 2 5 6 7 8 9 10 11 will be sampled in the original raster
    > fact.level for HF :    1:0    2:1 3:2 4:3 5:4 6:5 7:6 8:7 9:8 10:9    11:10
     - according to mask.out levels 0 1 3 4 have already been sampled
     - levels 2 5 6 7 8 9 10 11 will be sampled in the original raster

 ! Some NAs have been automatically removed from your data
      ! No data has been set aside for modeling evaluation
      ! No data has been set aside for modeling evaluation
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

I made a test above with 2 variables in my stk object : one layer continuous (bio2) and one categorical with values between 0 and 11 (HF). You can see that at the pseudo absences selection step, the function takes care to sample all levels of this factorial variable.

Hope it helps, Maya

Bellerophon92 commented 1 year ago

Hello Maya,

Thank you for the tip to look at the BIOMOD_FormatingData() console output!

When I run my code, biomod classifies the categorical layers as such according to the pseudoabsence selection.

Is there an option to plot the response curves for categorical variables with bmPlotResponseCurves() not in a line plot, but in a barplot, where the response of every level is one bar? Example: https://www.researchgate.net/figure/Response-curves-for-the-most-significant-variables-in-the-Maxent-model-for-the_fig5_329379636

Cheers Michael

MayaGueguen commented 1 year ago

Cool :+1:

As for plotting response curves with catagorical variables as barplot, I'm sorry, this is not implemented yet... But do not hesitate to use the table rendered by the bm_PlotResponseCurves function to create directly your own graphic !

Maya