biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
83 stars 22 forks source link

Formating data with user defined pseudo-absences #236

Closed estellebruni closed 1 year ago

estellebruni commented 1 year ago

Hello,

I am modelling a single species distribution at world scale with user defined pseudo-absences (pseudo-absences are in southern hemisphere only). I have 401 occurrences of my species and >13'000 PA. I want to format my data with Biomod2 using my own PA.

Is it correct to indicate PA.nb.rep=1, PA.nb.absences=10000 if I want to have 1 single run of 10'000 PA selected within the PA I generated myself? I hope the code is clear enough.

Thank you very much for you help.

Kind regards, Estelle

myResp <- c(rep(1, nrow(apo_occ)), rep(0, nrow(bkg_coord))) 
# apo_occ: df with species occurrences coordinates (=1)
# bkg_coord: df with pseudo-absences coordinates (=0)

myResp
   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [53] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [105] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [157] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [209] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [261] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [313] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [365] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [417] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [469] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [521] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [573] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [625] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [677] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [729] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [781] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [833] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [885] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [937] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [989] 0 0 0 0 0 0 0 0 0 0 0 0
 [ reached getOption("max.print") -- omitted 12953 entries ]

myResp.PA <- ifelse(myResp==1, 1, NA)
myResp.PA
   [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
  [36]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
  [71]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [106]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [141]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [176]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [211]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [246]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [281]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [316]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [351]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [386]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [421] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [456] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [491] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [526] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [561] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [596] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [631] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [666] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [701] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [736] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [771] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [806] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [841] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [876] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [911] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [946] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [981] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [ reached getOption("max.print") -- omitted 12953 entries ]

# generate PA.user.table
PAtable <- data.frame(PA1 = ifelse(myResp ==1, TRUE, FALSE))

# XY coordinates of species data
apo.xy <- apo_occ[,c('Longitude', 'Latitude')] %>% 
  rename(x = Longitude) %>% 
  rename(y = Latitude) %>% 
  as.data.frame

# bind with bkg_coord (=PA coordinates)
myResp.xy <- rbind(apo.xy, bkg_coord) %>% 
  as.data.frame

# output raster(curr_st)
curr_st
class       : SpatRaster 
dimensions  : 105, 216, 6  (nrow, ncol, nlyr)
resolution  : 1.666667, 1.666667  (x, y)
extent      : -180.0001, 179.9999, -91.00014, 83.99986  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
names       :     bio10,     bio15,    bio18,    bio2,       bio4,       TPI 
min values  : -38.08873,  10.22358,    0.000,  1.8818,   19.17403, -1370.017 
max values  :  38.38527, 211.97893, 2387.064, 15.9555, 2171.36617,  2137.092

ApoData_200 <- BIOMOD_FormatingData(resp.name = "Species.xyz", 
                                resp.var = myResp.PA, 
                                resp.xy = myResp.xy,
                                expl.var = curr_st,  
                                PA.strategy = "user.defined",
                                PA.user.table = PAtable,
                                filter.raster = TRUE,
                                PA.nb.rep=1,
                                PA.nb.absences=10000)

My session info:

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /cluster/software/bioinformatic/R/4.2.2/lib/R/lib/libRblas.so
LAPACK: /cluster/software/bioinformatic/R/4.2.2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.4.1    ggtext_0.1.2    tidyterra_0.3.2 terra_1.6-47    biomod2_4.2-2   Hmisc_4.7-1    
 [7] Formula_1.2-4   survival_3.4-0  lattice_0.20-45 forcats_0.5.2   stringr_1.4.1   dplyr_1.0.10   
[13] purrr_0.3.5     readr_2.1.3     tidyr_1.2.1     tibble_3.1.8    ggplot2_3.4.0   tidyverse_1.3.2
[19] plyr_1.8.7     

loaded via a namespace (and not attached):
  [1] googledrive_2.0.0      colorspace_2.0-3       deldir_1.0-6           ellipsis_0.3.2        
  [5] class_7.3-20           rgdal_1.6-3            htmlTable_2.4.1        markdown_1.1          
  [9] base64enc_0.1-3        fs_1.5.2               gridtext_0.1.5         proxy_0.4-26          
 [13] rstudioapi_0.14        farver_2.1.1           bit64_4.0.5            earth_5.3.1           
 [17] fansi_1.0.3            lubridate_1.9.0        xml2_1.3.3             codetools_0.2-18      
 [21] splines_4.2.2          knitr_1.40             ade4_1.7-20            jsonlite_1.8.3        
 [25] mda_0.5-3              pROC_1.18.0            broom_1.0.1            cluster_2.1.4         
 [29] dbplyr_2.2.1           png_0.1-7              compiler_4.2.2         httr_1.4.4            
 [33] adegraphics_1.0-18     backports_1.4.1        assertthat_0.2.1       Matrix_1.5-1          
 [37] fastmap_1.1.0          gargle_1.2.1           cli_3.4.1              htmltools_0.5.3       
 [41] tools_4.2.2            gtable_0.3.1           glue_1.6.2             reshape2_1.4.4        
 [45] Rcpp_1.0.8.3           PresenceAbsence_1.1.10 cellranger_1.1.0       raster_3.6-11         
 [49] vctrs_0.5.0            nlme_3.1-160           iterators_1.0.14       xfun_0.34             
 [53] rvest_1.0.3            maxnet_0.1.4           timechange_0.1.1       lifecycle_1.0.3       
 [57] googlesheets4_1.0.1    MASS_7.3-58.1          scales_1.2.1           vroom_1.6.0           
 [61] hms_1.1.2              parallel_4.2.2         RColorBrewer_1.1-3     yaml_2.3.6            
 [65] gridExtra_2.3          TeachingDemos_2.12     rpart_4.1.19           reshape_0.8.9         
 [69] latticeExtra_0.6-30    stringi_1.7.8          foreach_1.5.2          plotrix_3.8-2         
 [73] randomForest_4.7-1.1   e1071_1.7-9            checkmate_2.1.0        rlang_1.0.6           
 [77] pkgconfig_2.0.3        evaluate_0.18          sf_1.0-9               htmlwidgets_1.5.4     
 [81] bit_4.0.4              tidyselect_1.2.0       gbm_2.1.8              magrittr_2.0.3        
 [85] R6_2.5.1               generics_0.1.2         DBI_1.1.3              mgcv_1.8-41           
 [89] pillar_1.8.1           haven_2.5.1            foreign_0.8-83         withr_2.5.0           
 [93] units_0.8-0            abind_1.4-5            sp_1.4-7               nnet_7.3-18           
 [97] modelr_0.1.9           crayon_1.5.2           interp_1.1-3           KernSmooth_2.23-20    
[101] utf8_1.2.2             tzdb_0.3.0             rmarkdown_2.17         jpeg_0.1-9            
[105] grid_4.2.2             data.table_1.14.4      plotmo_3.6.1           classInt_0.4-3        
[109] reprex_2.0.2           digest_0.6.30          munsell_0.5.0 
estellebruni commented 1 year ago

Then when trying to to create the models with BIOMOD_Modeling(), I get this error message:

> ApoModel <- BIOMOD_Modeling(bm.format = ApoData_200,
+                             modeling.id = 'spe.xyz',
+                             models = c('GLM', 'GBM'), 
+                             nb.rep = 10, 
+                             data.split.perc = 80, 
+                             var.import = 3, 
+                             metric.eval = c('TSS','ROC'),
+                             do.full.models = F)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Checking Models arguments...
Warning in .BIOMOD_Modeling.check.args(bm.format, modeling.id, models, bm.options,  :
  Models will run with 'defaults' parameters

Creating suitable Workdir...
Error in `[<-`(`*tmp*`, bm.format@PA.table[, pa], , value = sampled.mat) : 
  (subscript) logical subscript too long
rpatin commented 1 year ago

Hello Estelle, Thank you for reporting :pray: The pseudo-absence table given to BIOMOD_FormatingData must have TRUE for all presences and for the pseudo-absence kept in the given pseudo-absence dataset. With the following line PAtable <- data.frame(PA1 = ifelse(myResp ==1, TRUE, FALSE)) you however set TRUE for all presences and FALSE for all the potential pseudo-absences. Thus the dataset have no pseudo-absences selected. You can check that with the output of BIOMOD_FormatingData with show(ApoData_200) in which you should see something like that:

1 Pseudo Absences dataset available ( PA1 ) with  0 absences in each (true abs + pseudo abs)

With no pseudo-absences selected the model cannot run (although the error you saw was not very clear - sorry for that).

As a solution, if you want to keep all the generated pseudo-absences you just have to fill PAtable with only TRUE: PAtable <- data.frame(PA1 = rep(TRUE, length(myResp))) If you want to keep only 10000 pseudo-absences you can use BIOMOD_FormatingData to randomly subsample:

ApoData_200 <- BIOMOD_FormatingData(resp.name = "Species.xyz", 
                                resp.var = myResp.PA, 
                                resp.xy = myResp.xy,
                                expl.var = curr_st,  
                                PA.strategy = "random",
                                filter.raster = TRUE,
                                PA.nb.rep=1,
                                PA.nb.absences=10000)

But then you have to set PA.strategy = 'random' and you do not have to give any PA table. On a side note when you use PA.strategy = 'user.defined', the argument PA.nb.rep and PA.nb.absences are ignored I hope this is clearer now. If not feel free to precise your question.

Best, Rémi

estellebruni commented 1 year ago

Hello Rémi,

Thank you for your very fast reply and help!

Unfortunately, even when changing the TRUE and FALSE table for PA, I still have the same error message when running BIOMOD_Modeling().

I used the following code that is slightly modified compared to the first one I posted:

### get pseudo-absences in a defined region of the world + occurrences of the species
bkg_coord <- read.csv("data/bkg_coordinates_13552.csv") %>% 
  dplyr::rename("long"="x", "lat" = "y") # change to long and lat so that it is similar to the occurrences table
bkg_coord$species.xyz <- c(rep(NA, nrow(bkg_coord)))

# species occurrences table with 3 columns: long, lat, occ
apo_occ <- read.table("data/ApoVas_occurrences_only.txt", header=T) %>%  
  dplyr::rename("species.xyz" = "occ")

### Formating the data
apoAll <- rbind(apo_occ, bkg_coord) # 1= species occurrences, NA= generated pseudo-absences 

myRespName <- 'species.xyz'

myResp <- as.numeric(apoAll[, myRespName])

myRespXY <- apoAll[, c("long", "lat")] 

PAtable <- data.frame(PA1 = rep(TRUE, length(myResp)))

# output env variables raster (curr_st)
curr_st
class       : SpatRaster 
dimensions  : 105, 216, 6  (nrow, ncol, nlyr)
resolution  : 1.666667, 1.666667  (x, y)
extent      : -180.0001, 179.9999, -91.00014, 83.99986  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
names       :     bio10,     bio15,    bio18,    bio2,       bio4,       TPI 
min values  : -38.08873,  10.22358,    0.000,  1.8818,   19.17403, -1370.017 
max values  :  38.38527, 211.97893, 2387.064, 15.9555, 2171.36617,  2137.092

### BIOMOD2 formating
ApoData_200 <- BIOMOD_FormatingData(resp.name = myRespName, 
                                resp.var = myResp, 
                                resp.xy = myRespXY,
                                expl.var = curr_st, 
                                PA.strategy = "user.defined",
                                PA.user.table = PAtable,
                                filter.raster = TRUE
                                )

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= species.xyz Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

> Pseudo absences used will be user defined ones !
      ! No data has been set aside for modeling evaluation
 !!! Some data are located in the same raster cell. 
          Only the first data in each cell will be kept as `filter.raster = TRUE`.

Checking Pseudo-absence selection arguments...

   > User defined pseudo absences selection
 ! Some NAs have been automatically removed from your data
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

> summary(ApoData_200)
      dataset run   PA Presences True_Absences Pseudo_Absences Undefined Total_Absences
1     initial  NA <NA>        32             0               0      1099              0
2 calibration  NA  PA1        32             0           13349        NA          13349

### create model
> ApoModel <- BIOMOD_Modeling(bm.format = ApoData_200,
+                             modeling.id = 'AllModels',
+                             models = c('GLM', 'GBM', 'MAXENT'), 
+                             nb.rep = 10, 
+                             data.split.perc = 80, 
+                             var.import = 3, 
+                             metric.eval = c('TSS','ROC'),
+                             do.full.models = FALSE)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Checking Models arguments...
Warning in .BIOMOD_Modeling.check.args(bm.format, modeling.id, models, bm.options,  :
  Models will run with 'defaults' parameters

Creating suitable Workdir...
Error in `[<-`(`*tmp*`, bm.format@PA.table[, pa], , value = sampled.mat) : 
  (subscript) logical subscript too long

Looking at the summary results, I was wondering if the lower number of undefined points in the intial dataset (1'099) compared to the pseudo-absences in the calibration set (13'349) might be problematic ?

Thank you for your help.

Best regards, Estelle

rpatin commented 1 year ago

Hello Estelle, Thank you for the additionnal information :pray:

This is quite puzzling, I do not think I have ever seen a biomod.formated.data object with more pseudo-absences than undefined point. However I could not reproduce it yet. It may have something to do with filter.raster = TRUE but when I try to do the same, BIOMOD_FormatingData is not succeeding (which we will have to fix).

Anyway, can you try with filter.raster = FALSE ? This will help to identify whether this is indeed related to the filtering.

Then, if you want to keep using filter.raster = TRUE you can:

  1. do the filtering yourself and feed BIOMOD_FormatingData with the filtered dataset (with only one point per cells)
  2. I can try to fix the bug however I would likely need your data (occurences, pseudo-absences and environment raster) to reproduce your issue. If that is fine for you, you can send them to remi.patin@univ-grenoble-alpes.fr.

Best regards, Rémi

estellebruni commented 1 year ago

Hello Rémi,

Thank you for your fast reply - it is highly appreciated :)

When setting filter.raster = FALSE, I get the exact same number of unidentified points (in the initial dataset) and PA (in the calibration set). So it seems filter.raster = TRUE doesn't apply to the PA table defined with PA.user.table=PAtable.

Although, when setting filter.raster = FALSE to format the data, I can run afterwards BIOMOD_Modeling() and the error message doesn't appear anymore...

As it would be ideal to keep filter.raster = TRUE, I'll send you my code and data asap.

Kind regards, Estelle

rpatin commented 1 year ago

Bonjour Estelle, Thank you for the data and script, this makes our life so much easier to help in debugging and improving biomod2 :pray:

So as you hinted, there was an oversight when adding the filter.raster option, which did not apply to the PA.user.table when using PA.strategy = 'user.defined'. The weird summary with more pseudo-absences than undefined points in the dataset was however just some superficial data summary problem which were easily corrected.

In summary, I pushed a new version, which hopefully should fix the issue and let you use filter.raster = TRUE. If not, please let me know by updating the issue. You can install the new version with devtools::install_github('biomodhub/biomod2')

Kind regards, Rémi

estellebruni commented 1 year ago

Salut Rémi,

Thank you for debugging this issue so fast. It is working now, I could format my dataset using user defined PA and create the models using BIOMOD_Modeling().

Kind regards, Estelle