Closed tongruiju closed 1 year ago
Dear Ruiju,
Thank you for reporting :pray:
If the R session crashed, it is likely a memory issue. Did it used to work with the same data and previous biomod2
version ?
Which quantity of data do you used ? Can you share the output of:
table(DataSpecies$SP, useNA = "ifany")
show(myExpl)
Thanks in advance, Best regards, Rémi
Dear Rémi,
Sorry, there is actually a pop-up window that says “R Session Aborted. R has encountered a fatal error, and the session has been terminated.”
Previously, sometimes the process would crash during the projection section using GBM with the previous version of biomod2, but the SP Data Formatting step completed quickly. Currently, it is stuck in that step. The amount of raster data is 110GB, with a maximum of approximately 6000 points containing presence and background data for each species. The process used up to 73GB of memory, which is the maximum for this step. The total available memory is 512GB.
> table(DataSpecies$SP, useNA = "ifany")
0 1
3108 3108
> show(myExpl)
class : RasterStack
dimensions : 29369, 69470, 2040264430, 15 (nrow, ncol, ncell, nlayers)
resolution : 500, 500 (x, y)
extent : -17367530, 17367470, -7342270, 7342230 (xmin, xmax, ymin, ymax)
crs : +proj=cea +lat_ts=30 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs
names : bpi9, curvature, gebco2022PJ, gebco2022PJ_TRI, gebglo_oaWGS84PJ, gebglo_phWGS84PJ, gebphosWGS84PJ, gebposWGS84PJ, gebsalWGS84PJ, gebsilWGS84PJ, gebtempWGS84PJ, pro_ChlorophyllMean, pro_CurrentVelocityMean, Reproject//51_to_2000, slope
min values : -8.443740e+03, ?, -1.091900e+04, ?, ?, ?, ?, 5.448590e-01, ?, 3.321393e-01, -2.080507e+00, ?, 2.233892e-11, ?, ?
max values : 3717.81470, ?, 8520.00000, ?, ?, ?, ?, 140.64447, ?, 247.57033, 29.89326, ?, 1.55857, ?, ?
Best, Ruiju
Dear Rémi,
I tested the codes on another workstation. The SP Data Formating still takes 1-2hours, which is quite diffrent from the previous biomod2 version, but it does not crash. However, a new error occured in the modeling process, which has not happened before. I used the spatial cross-vaidation. Thank you very much.
> DataSpecies <- read.csv("D:\\PAPER9\\species\\PA_ex.csv")
>
> attach(DataSpecies)
> head(DataSpecies)
SP x y
1 1 -17363266 -5129765
2 1 -17309639 -5125333
3 1 -17304814 -5125333
4 1 -17303179 -5125333
5 1 -17242098 -5134500
6 1 -17237274 -5129918
> myRespName <- 'SP'
> myResp <- as.numeric(DataSpecies[,myRespName])
> myRespXY <- DataSpecies[,c("x", "y")]
> list <- list.files(path='D:\\PAPER9\\image', full.names=TRUE)
> rasterData <- stack(list)
> myExpl<-raster::stack(rasterData)
> myBiomodData <- BIOMOD_FormatingData(resp.var=myResp,
expl.var=myExpl,
resp.xy=myRespXY,
resp.name=myRespName,
na.rm=TRUE)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= SP Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
! No data has been set aside for modeling evaluation
eling evaluation
!!! Some data are located in the same raster cell.
Please set `filter.raster = TRUE` if you want an automatic filtering.
! No data has been set aside for modeling evaluation
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> myBiomodCV <- read.csv("D:\\PAPER9\\species\\PA_exB.csv",header=FALSE)
> myBiomodCV <- as.matrix(myBiomodCV )
> myBiomodOptions <- BIOMOD_ModelingOptions()
> myBiomodModelOut<-BIOMOD_Modeling(bm.format = myBiomodData, bm.options = myBiomodOptions,models=c('GBM', 'RF'), data.split.table = myBiomodCV, prevalence=0.5, var.import=1, metric.eval=c('TSS', 'KAPPA', 'ROC'), do.full.models=FALSE, modeling.id=paste(myRespName, "FirstModeling", sep=""))
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Checking Models arguments...
> Automatic weights creation to rise a 0.5 prevalence
! ignored obsolete argument 'do.full.models' as 'CV.do.full.models' was also given
! ignored obsolete argument 'data.split.table' as 'CV.strategy' was not set to 'user.defined'
Creating suitable Workdir...
Checking Cross-Validation arguments...
Error in .bm_CrossValidation.check.args(bm.format = bm.format, strategy = strategy, :
perc (or CV.perc) is required when strategy = 'random'
Best, Ruiju
Dear Ruiju, Thank you for the update.
Concerning the crash, it is rather difficult to understand what is happening. I can advise:
library(terra) ; terraOptions(todisk = TRUE)
before running your script, this may help prevent crash from handling very large raster. Beware that if the disk is not an SSD this can then be quite slower.remi.patin@univ-grenoble-alpes.fr
, I can setup a cloud repo if need be.Concerning the argument error, the cross-validation module was updated in a recent version and argument changed. Here you need to adjust slightly your code (as advised by the error encountered):
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
bm.options = myBiomodOptions,
models=c('GBM', 'RF'),
prevalence=0.5, var.import=1,
metric.eval=c('TSS', 'KAPPA', 'ROC'),
CV.user.table = myBiomodCV,
CV.strategy = "user.defined",
modeling.id=paste(myRespName, "FirstModeling", sep=""))
Beware, however that the formatting of user provided CV table may have changed slightly and you may need to adapt the formatting of myBiomodCV
.
Best, Rémi
Dear Rémi,
I added the headline of myBiomodCV as following,
_allData_RUN1 _allData_RUN2 _allData_RUN3 _allData_RUN4 _allData_RUN5
FALSE TRUE TRUE TRUE TRUE
FALSE TRUE TRUE TRUE TRUE
The species data as following,
SP x y
1 -17363265.75 -5129765.334
1 -17309638.68 -5125332.961
And i got the error below,
> myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
+ bm.options = myBiomodOptions,
+ models=c('GBM', 'RF'),
+ prevalence=0.5, var.import=1,
+ metric.eval=c('TSS', 'KAPPA', 'ROC'),
+ CV.user.table = myBiomodCV,
+ CV.strategy = "user.defined",
+ modeling.id=paste(myRespName, "FirstModeling", sep=""))
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Checking Models arguments...
> Automatic weights creation to rise a 0.5 prevalence
Creating suitable Workdir...
Checking Cross-Validation arguments...
Error in .bm_CrossValidation.check.args(bm.format = bm.format, strategy = strategy, :
user.table must have as many rows (dim1) than your species as data
I wonder if the deletion of multiple points within a single cell is done automatically and therefore causes the error. Is it possible to retain the multipoints?
Best, Ruiju
Dear Ruiju,
By default option filter.raster
in BIOMOD_FormattingData
is set to FALSE
so this should not be the issue here.
Maybe you can try with na.rm = FALSE
, this could be the issue here. (This may make BIOMOD_FormattingData
fail, in which case it means that you have to remove points with NA
in environmental variables beforehand)
Can you share the output of:
nrow(myBiomodCV)
show(myBiomodData)
Did you have a specific method for cross-validation ? If not you can also use the method integrated within biomod2
which will function more easily with NA
and/or filter.raster
.
Best, Rémi
Dear Rémi,
Thanks a lot. Using the setting na.rm=FALSE still does not work. The rows of myBiomodCV are shown as 3898, as seen below. however, there are actually 3897 lines of records in the data, and one headline. s 3898. You can see below. But actually it is 3897 lines of records, and one being the header. Yes,we do have specific method of spatial cross vallidation. BTW, is the difference between trained models predicted using different versions biomod2 4.2-1 and biomod2 4.2-4-1, larger than the difference between models predicted using same version biomod2 4.2-1 ?
myBiomodData <- BIOMOD_FormatingData(resp.var=myResp,expl.var=myExpl, resp.xy=myRespXY, resp.name=myRespName, na.rm=FALSE)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= SP Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
! No data has been set aside for modeling evaluation
eling evaluation
!!! Some data are located in the same raster cell.
Please set `filter.raster = TRUE` if you want an automatic filtering.
! No data has been set aside for modeling evaluation
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> myBiomodCV <- read.csv("D:\\PAPER9\\species\\PA_exB.csv",header=FALSE)
> myBiomodCV <- as.matrix(myBiomodCV )
> myBiomodOptions <- BIOMOD_ModelingOptions()
> myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
+ bm.options = myBiomodOptions,
+ models=c('GBM', 'RF'),
+ prevalence=0.5, var.import=1,
+ metric.eval=c('TSS', 'KAPPA', 'ROC'),
+ CV.user.table = myBiomodCV,
+ CV.strategy = "user.defined",
+ modeling.id=paste(myRespName, "FirstModeling", sep=""))
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Checking Models arguments...
> Automatic weights creation to rise a 0.5 prevalence
Creating suitable Workdir...
Checking Cross-Validation arguments...
Error in .bm_CrossValidation.check.args(bm.format = bm.format, strategy = strategy, :
user.table must have as many rows (dim1) than your species as data
>
> nrow(myBiomodCV)
[1] 3898
> show(myBiomodData)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.formated.data -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
dir.name = .
sp.name = SP
789 presences, 3108 true absences and 0 undefined points in dataset
15 explanatory variables
bpi9 curvature gebco2022PJ gebco2022PJ_TRI
Min. :-257.1729 Min. :-1.26e-01 Min. :-8905 Min. : 0.000
1st Qu.: -3.7407 1st Qu.:-1.20e-03 1st Qu.:-2946 1st Qu.: 6.708
Median : 0.0864 Median : 0.00e+00 Median :-1144 Median : 17.464
Mean : -1.0328 Mean : 1.64e-06 Mean :-1761 Mean : 45.285
3rd Qu.: 3.5555 3rd Qu.: 1.20e-03 3rd Qu.: -313 3rd Qu.: 45.133
Max. : 442.3704 Max. : 1.36e-01 Max. : -1 Max. :795.627
gebglo_ocWGS84PJ gebglo_phWGS84PJ gebphosWGS84PJ gebposWGS84PJ
Min. :0.5354 Min. :7.560 Min. :0.04364 Min. : 5.731
1st Qu.:1.1274 1st Qu.:7.857 1st Qu.:0.85518 1st Qu.: 60.251
Median :1.8251 Median :7.974 Median :1.16032 Median : 80.841
Mean :2.0208 Mean :7.945 Mean :1.37619 Mean : 75.198
3rd Qu.:2.6954 3rd Qu.:8.035 3rd Qu.:2.10400 3rd Qu.: 92.027
Max. :6.2228 Max. :8.167 Max. :3.20775 Max. :104.556
gebsalWGS84PJ gebsilWGS84PJ gebtempWGS84PJ pro_ChlorophyllMean
Min. : 5.558 Min. : 0.9359 Min. :-1.532 Min. :0.004262
1st Qu.:34.686 1st Qu.: 6.4019 1st Qu.: 1.818 1st Qu.:0.004363
Median :34.911 Median : 13.4248 Median : 3.644 Median :0.006174
Mean :34.783 Mean : 39.8260 Mean : 5.150 Mean :0.055474
3rd Qu.:35.099 3rd Qu.: 56.5542 3rd Qu.: 7.141 3rd Qu.:0.056123
Max. :38.742 Max. :174.8513 Max. :28.940 Max. :1.499119
pro_CurrentVelocityMean Reproject_epc_av_1951_to_2000 slope
Min. :0.00314 Min. : 0.2631 Min. : 0.0000
1st Qu.:0.03791 1st Qu.: 3.4568 1st Qu.: 0.2641
Median :0.05817 Median : 8.1798 Median : 0.7193
Mean :0.07654 Mean :12.3095 Mean : 1.9046
3rd Qu.:0.09491 3rd Qu.:17.9012 3rd Qu.: 1.9449
Max. :0.77660 Max. :75.7245 Max. :31.3036
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Best regards, Ruiju
Dear Ruiju, Thank you for the additional explanation :pray: Can you try by changing:
> myBiomodCV <- read.csv("D:\\PAPER9\\species\\PA_exB.csv",header=FALSE)
into
> myBiomodCV <- read.csv("D:\\PAPER9\\species\\PA_exB.csv",header=TRUE)
I fear that your header was understood as data hence the mismatch in dimension.
If you call colnames(myBiomodCV)
this should give you:
_allData_RUN1 _allData_RUN2 _allData_RUN3 _allData_RUN4 _allData_RUN5
Concerning your other question:
BTW, is the difference between trained models predicted using different versions biomod2 4.2-1 and biomod2 4.2-4-1, larger than the difference between models predicted using same version biomod2 4.2-1 ?
I am not sure to understand what you mean. There has been quite a lots of changes since 4.2-1
(see here). Among the changes, one may be impacting your perception of model performance ( bugfix in 4.2-3
):
validation metric calculation now properly use the calibration threshold (i.e a threshold optimized on calibration data instead of validation data). This can lead to less optimistic threshold-dependent validation metric.
which can affect perceived model performance, but the less optimistic results in current biomod2
version are more reliable than the old one.
Best regards, Rémi
Dear the Team,
I use R 4.2.3 and have updated biomod2. The code became stuck at “SP Data Formatting” for hours, and then the R session crashed without any warnings or errors being reported.
Here comes the model run. Thanks a lot in advance.
Best regards, Ruiju