Closed estellebruni closed 1 year ago
Then when trying to to create the models with BIOMOD_Modeling()
, I get this error message:
> ApoModel <- BIOMOD_Modeling(bm.format = ApoData_200,
+ modeling.id = 'spe.xyz',
+ models = c('GLM', 'GBM'),
+ nb.rep = 10,
+ data.split.perc = 80,
+ var.import = 3,
+ metric.eval = c('TSS','ROC'),
+ do.full.models = F)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Checking Models arguments...
Warning in .BIOMOD_Modeling.check.args(bm.format, modeling.id, models, bm.options, :
Models will run with 'defaults' parameters
Creating suitable Workdir...
Error in `[<-`(`*tmp*`, bm.format@PA.table[, pa], , value = sampled.mat) :
(subscript) logical subscript too long
Hello Estelle,
Thank you for reporting :pray:
The pseudo-absence table given to BIOMOD_FormatingData
must have TRUE
for all presences and for the pseudo-absence kept in the given pseudo-absence dataset.
With the following line PAtable <- data.frame(PA1 = ifelse(myResp ==1, TRUE, FALSE))
you however set TRUE for all presences and FALSE for all the potential pseudo-absences. Thus the dataset have no pseudo-absences selected. You can check that with the output of BIOMOD_FormatingData
with show(ApoData_200)
in which you should see something like that:
1 Pseudo Absences dataset available ( PA1 ) with 0 absences in each (true abs + pseudo abs)
With no pseudo-absences selected the model cannot run (although the error you saw was not very clear - sorry for that).
As a solution, if you want to keep all the generated pseudo-absences you just have to fill PAtable
with only TRUE
:
PAtable <- data.frame(PA1 = rep(TRUE, length(myResp)))
If you want to keep only 10000 pseudo-absences you can use BIOMOD_FormatingData
to randomly subsample:
ApoData_200 <- BIOMOD_FormatingData(resp.name = "Species.xyz",
resp.var = myResp.PA,
resp.xy = myResp.xy,
expl.var = curr_st,
PA.strategy = "random",
filter.raster = TRUE,
PA.nb.rep=1,
PA.nb.absences=10000)
But then you have to set PA.strategy = 'random'
and you do not have to give any PA table. On a side note when you use PA.strategy = 'user.defined'
, the argument PA.nb.rep
and PA.nb.absences
are ignored
I hope this is clearer now. If not feel free to precise your question.
Best, Rémi
Hello Rémi,
Thank you for your very fast reply and help!
Unfortunately, even when changing the TRUE
and FALSE
table for PA, I still have the same error message when running BIOMOD_Modeling()
.
I used the following code that is slightly modified compared to the first one I posted:
### get pseudo-absences in a defined region of the world + occurrences of the species
bkg_coord <- read.csv("data/bkg_coordinates_13552.csv") %>%
dplyr::rename("long"="x", "lat" = "y") # change to long and lat so that it is similar to the occurrences table
bkg_coord$species.xyz <- c(rep(NA, nrow(bkg_coord)))
# species occurrences table with 3 columns: long, lat, occ
apo_occ <- read.table("data/ApoVas_occurrences_only.txt", header=T) %>%
dplyr::rename("species.xyz" = "occ")
### Formating the data
apoAll <- rbind(apo_occ, bkg_coord) # 1= species occurrences, NA= generated pseudo-absences
myRespName <- 'species.xyz'
myResp <- as.numeric(apoAll[, myRespName])
myRespXY <- apoAll[, c("long", "lat")]
PAtable <- data.frame(PA1 = rep(TRUE, length(myResp)))
# output env variables raster (curr_st)
curr_st
class : SpatRaster
dimensions : 105, 216, 6 (nrow, ncol, nlyr)
resolution : 1.666667, 1.666667 (x, y)
extent : -180.0001, 179.9999, -91.00014, 83.99986 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source(s) : memory
names : bio10, bio15, bio18, bio2, bio4, TPI
min values : -38.08873, 10.22358, 0.000, 1.8818, 19.17403, -1370.017
max values : 38.38527, 211.97893, 2387.064, 15.9555, 2171.36617, 2137.092
### BIOMOD2 formating
ApoData_200 <- BIOMOD_FormatingData(resp.name = myRespName,
resp.var = myResp,
resp.xy = myRespXY,
expl.var = curr_st,
PA.strategy = "user.defined",
PA.user.table = PAtable,
filter.raster = TRUE
)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= species.xyz Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Pseudo absences used will be user defined ones !
! No data has been set aside for modeling evaluation
!!! Some data are located in the same raster cell.
Only the first data in each cell will be kept as `filter.raster = TRUE`.
Checking Pseudo-absence selection arguments...
> User defined pseudo absences selection
! Some NAs have been automatically removed from your data
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> summary(ApoData_200)
dataset run PA Presences True_Absences Pseudo_Absences Undefined Total_Absences
1 initial NA <NA> 32 0 0 1099 0
2 calibration NA PA1 32 0 13349 NA 13349
### create model
> ApoModel <- BIOMOD_Modeling(bm.format = ApoData_200,
+ modeling.id = 'AllModels',
+ models = c('GLM', 'GBM', 'MAXENT'),
+ nb.rep = 10,
+ data.split.perc = 80,
+ var.import = 3,
+ metric.eval = c('TSS','ROC'),
+ do.full.models = FALSE)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Checking Models arguments...
Warning in .BIOMOD_Modeling.check.args(bm.format, modeling.id, models, bm.options, :
Models will run with 'defaults' parameters
Creating suitable Workdir...
Error in `[<-`(`*tmp*`, bm.format@PA.table[, pa], , value = sampled.mat) :
(subscript) logical subscript too long
Looking at the summary results, I was wondering if the lower number of undefined points in the intial dataset (1'099) compared to the pseudo-absences in the calibration set (13'349) might be problematic ?
Thank you for your help.
Best regards, Estelle
Hello Estelle, Thank you for the additionnal information :pray:
This is quite puzzling, I do not think I have ever seen a biomod.formated.data
object with more pseudo-absences than undefined point. However I could not reproduce it yet. It may have something to do with filter.raster = TRUE
but when I try to do the same, BIOMOD_FormatingData
is not succeeding (which we will have to fix).
Anyway, can you try with filter.raster = FALSE
? This will help to identify whether this is indeed related to the filtering.
Then, if you want to keep using filter.raster = TRUE
you can:
BIOMOD_FormatingData
with the filtered dataset (with only one point per cells)Best regards, Rémi
Hello Rémi,
Thank you for your fast reply - it is highly appreciated :)
When setting filter.raster = FALSE
, I get the exact same number of unidentified points (in the initial dataset) and PA (in the calibration set). So it seems filter.raster = TRUE
doesn't apply to the PA table defined with PA.user.table=PAtable
.
Although, when setting filter.raster = FALSE
to format the data, I can run afterwards BIOMOD_Modeling()
and the error message doesn't appear anymore...
As it would be ideal to keep filter.raster = TRUE
, I'll send you my code and data asap.
Kind regards, Estelle
Bonjour Estelle,
Thank you for the data and script, this makes our life so much easier to help in debugging and improving biomod2
:pray:
So as you hinted, there was an oversight when adding the filter.raster
option, which did not apply to the PA.user.table
when using PA.strategy = 'user.defined'
. The weird summary with more pseudo-absences than undefined points in the dataset was however just some superficial data summary problem which were easily corrected.
In summary, I pushed a new version, which hopefully should fix the issue and let you use filter.raster = TRUE
. If not, please let me know by updating the issue.
You can install the new version with devtools::install_github('biomodhub/biomod2')
Kind regards, Rémi
Salut Rémi,
Thank you for debugging this issue so fast. It is working now, I could format my dataset using user defined PA and create the models using BIOMOD_Modeling()
.
Kind regards, Estelle
Hello,
I am modelling a single species distribution at world scale with user defined pseudo-absences (pseudo-absences are in southern hemisphere only). I have 401 occurrences of my species and >13'000 PA. I want to format my data with Biomod2 using my own PA.
Is it correct to indicate
PA.nb.rep=1, PA.nb.absences=10000
if I want to have 1 single run of 10'000 PA selected within the PA I generated myself? I hope the code is clear enough.Thank you very much for you help.
Kind regards, Estelle
My session info: