biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
83 stars 22 forks source link

Help with BIOMOD_formatingdata - error: number of items to replace is not a multiple of replacement length #339

Closed lveldhuisen closed 11 months ago

lveldhuisen commented 11 months ago

Context and question I am trying to format my data using the BIOMOD_FormatingData() function, and running into an error code I don't know how to fix. I'm using a few of the bioclim variables, as well as soil variables from this site: https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1242, which seems like it may be the root of the issue. The error I get when I run the code is:

Error in pa.tab.tmp[, j] <- SR : number of items to replace is not a multiple of replacement length.

I have tried different numbers for generating pseudo absences with the same error, and am not sure where to go from here. Thanks so much!

Code used

AgoAur_data <- BIOMOD_FormatingData(
  resp.var = rep(1,nrow(AgoAur_occ)), 
  expl.var = env_vars_NA_sub_test,
  resp.xy = AgoAur_occ[,c('decimalLongitude','decimalLatitude')],
  resp.name = "Agoseris.aurantiaca",
  PA.nb.rep = 3,
  PA.nb.absences = 500, PA.strategy = 'random'
  )

# output 
Error in pa.tab.tmp[, j] <- SR : 
  number of items to replace is not a multiple of replacement length

# show(myExpl) # if using an environment raster

> show(env_vars_NA_sub_test)
class       : SpatRaster 
dimensions  : 900, 2160, 26  (nrow, ncol, nlyr)
resolution  : 0.1666667, 0.1666667  (x, y)
extent      : -180, 180, -60, 90.00001  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
sources     : bio_1  
              bio_2  
              bio_3  
              ... and 23 more source(s)
names       : bio_1, bio_2, bio_3, bio_4, bio_5, bio_6, ... 
min values  :  -269,     9,     8,    72,   -59,  -547, ... 
max values  :   314,   211,    95, 22673,   489,   258, ... 

> show(AgoAur_occ)
# A tibble: 1,173 × 3
   species             decimalLatitude decimalLongitude
   <chr>                         <dbl>            <dbl>
 1 Agoseris.aurantiaca            54.1            -120.
 2 Agoseris.aurantiaca            54.2            -120.
 3 Agoseris.aurantiaca            54.0            -120.
 4 Agoseris.aurantiaca            54.2            -120.
 5 Agoseris.aurantiaca            54.1            -120.
 6 Agoseris.aurantiaca            54.1            -120.
 7 Agoseris.aurantiaca            52.9            -117.
 8 Agoseris.aurantiaca            48.5            -121.
 9 Agoseris.aurantiaca            47.0            -122.
10 Agoseris.aurantiaca            48.5            -121.
# ℹ 1,163 more rows

Environment Information

version R version 4.3.1 (2023-06-16) os macOS Ventura 13.5.2 system aarch64, darwin20 ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/Phoenix date 2023-10-03 rstudio 2023.09.0+463 Desert Sunflower (desktop) pandoc NA

MayaGueguen commented 11 months ago

Hello Leah,

Thank you for your issue 🙏 Could you tell me which biomod2 package version you are using please ?

packageVersion("biomod2")

Also, I'm not sure tibble object are correctly understood by biomod2... Could you try transforming your AgoAur_occ object into a data.frame ?

Maya

lveldhuisen commented 11 months ago

Hi Maya,

packageVersion("biomod2") [1] ‘4.2.5’

I tried converting the AgoAur_occ object to a dataframe, and got the exact same error message. here is the code I used:

AgoAur_occ_df <- as.data.frame(AgoAur_occ)

AgoAur_data <- BIOMOD_FormatingData( resp.var = rep(1,nrow(AgoAur_occ_df)), expl.var = env_vars_NA_sub_test, resp.xy = AgoAur_occ_df[,c('decimalLongitude','decimalLatitude')], resp.name = "Agoseris.aurantiaca",PA.nb.rep = 3, PA.nb.absences = 500, PA.strategy = 'random' )

Gave me the same error code: Error in pa.tab.tmp[, j] <- SR : number of items to replace is not a multiple of replacement length

Thank you!

MayaGueguen commented 11 months ago

Hello Leah,

Thank you for trying. I'll try to reproduce your error.

Meanwhile, I see that your raster variables has 26 layers... As you are trying to sample pseudo-absences through SRE method, it is highly probable that it fails finding combinations matching with alll your variables... Moreover, it is very likely that many variables are correlated to each other, you should check that before and select fewer variables to give to biomod.

I also see that you seem to have replicates in your points coordinates. Consider maybe to set filter.Raster = TRUE in BIOMOD_FormatingData to keep only one observation per grid cell.

Maya

lveldhuisen commented 11 months ago

Hi Maya,

Even with fewer environmental variables (3 from bioclim and 4 of the soil variables that are not correlated), I get the same error code. I also tried adding in filter.raster = TRUE, and still have the same error.

here is the code I just tried:

AgoAur_data <- BIOMOD_FormatingData( resp.var = rep(1,nrow(AgoAur_occ_df)), expl.var = expl.var, resp.xy = AgoAur_occ_df[,c('decimalLongitude','decimalLatitude')], resp.name = "Agoseris.aurantiaca",PA.nb.rep = 3, PA.nb.absences = 500, PA.strategy = 'random', filter.raster = TRUE )

Error in pa.tab.tmp[, j] <- SR : number of items to replace is not a multiple of replacement length

show(expl.var) class : SpatRaster dimensions : 900, 2160, 7 (nrow, ncol, nlyr) resolution : 0.1666667, 0.1666667 (x, y) extent : -180, 180, -60, 90.00001 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat +datum=WGS84 +no_defs sources : bio_4
bio_10
bio_12
... and 4 more source(s) names : bio_4, bio_10, bio_12, Unifi~ntent, Unifi~ction, Unifi~acity, ... min values : 72, -97, 0, -5.332865, 0, 0, ... max values : 22673, 380, 9916, 78.674273, 94, 190, ...

MayaGueguen commented 11 months ago

Hello Leah,

Thank you so much for trying :pray: It can only help you further in the analysis anyway :)

Would you mind sharing with me your AgoAur_occ_df and expl.var objects ? It would definitely help finding what's going on. You can either try and share them here, or send them to me at maya.gueguen [at] univ-grenoble-alpes.fr

Maya

lveldhuisen commented 11 months ago

Hi Maya,

I just sent you an email with access to all the files you should need. Thank you so much!

Leah

MayaGueguen commented 11 months ago

Thank you Leah for your script and data :pray: Unfortunately, the expl.var object is not loading properly...

Error in .Primitive(".C")(<pointer: (nil)>, n = as.integer(n), x = as.double(x)) : NULL value passed as symbol address

Could you try to wrap your object before saving it please ?

library(terra)
expl.var <- wrap(expl.var)
save(expl.var, file = "yourfile.rda")
lveldhuisen commented 11 months ago

Hi Maya - I just saved the new wrapped object in the Box folder I shared with you! It's called "expl.var1.rda".

MayaGueguen commented 11 months ago

Hello Leah,

Thank you so much for the updated data :pray: The problem is coming from the fact that all your raster layers are defined as categorical. So it is trying to draw pseudo-absences matching all variables values combinations, which is huge as it is numerical values normally. The message at the beginning of the BIOMOD_FormatingData function tells you about that. This works :

library(data.table)
library(terra)
library(biomod2)

AgoAur_occ = fread("AgoAur_occ.csv", data.table = FALSE)
load("expl.var1.rda")
expl.var = unwrap(expl.var)
expl.var$bio_4 = as.numeric(expl.var$bio_4)
expl.var$bio_10 = as.numeric(expl.var$bio_10)
expl.var$bio_12 = as.numeric(expl.var$bio_12)

AgoAur_data <- BIOMOD_FormatingData(
  resp.var = rep(1, nrow(AgoAur_occ)), 
  expl.var = expl.var,
  resp.xy = AgoAur_occ[,c('decimalLongitude','decimalLatitude')],
  resp.name = "Agoseris.aurantiaca",
  PA.nb.rep = 3,
  PA.nb.absences = 500, 
  PA.strategy = 'random',
  filter.raster = TRUE
)

Maya

lveldhuisen commented 11 months ago

Ah it worked! Thank you so much for all your help!