biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
82 stars 22 forks source link

Help with BIOMOD_xxx - [short question here] #494

Open saeedbehzadifard1376 opened 4 weeks ago

saeedbehzadifard1376 commented 4 weeks ago

Hello, Recently, while working with this package, I encountered a problem. I randomly split the test and training data into 20% and 80%. I also generated 10,000 separate pseudo-absence data points. I defined these settings in the BIOMOD_FormatingData section, but I received the following message:

I want the test data to be independent Thanks in advance for your help. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= zisp Data Formating -=-=-=-=-=-=-=-=-=

!!! Some data are located in the same raster cell. Only the first data in each cell will be kept as filter.raster = TRUE.

Checking Pseudo-absence selection arguments...

User defined pseudo absences selection

! Some NAs have been automatically removed from your data ! No data has been set aside for modeling evaluation ! Some NAs have been automatically removed from your data -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=

Load current climate layers

now <- list.files(pattern='.tif$', path='mask/var', full.name=T) layers1 <- stack(now) layers1 names(layers1)

---------

Hierarchical Clustering (Multicollinearity)

b <- removeCollinearity( layers1, multicollinearity.cutoff = 0.7, select.variables = T, sample.points = TRUE, nb.points = 10000, plot = TRUE, method = "pearson")

------------------

------------------- "Train Data"---- 80% Data presence and pseudo-absence

TrainData <- 'Train'

myRespName <- 'zisp'

TrainData <- read.csv('Train.csv')

head(TrainData) tail(TrainData) nrow(TrainData)

myResp <- as.numeric(TrainData[,myRespName]) myResp myRespXY <- TrainData[,c("longitude","latitude")] myRespXY

----

------

---------- The custom pseudo-absence table defined.

----- Transform true absences into potential pseudo-absences

myResp.PA <- ifelse(myResp == 1, 1, NA) myResp.PA

myResp.PA.vect <- vect(cbind(myRespXY, myResp.PA), geom = c("longitude","latitude")) myResp.PA.vect

user.defined method

myPAtable <- data.frame(PA1 = ifelse(myResp == 1, TRUE, FALSE)) for (i in 1:ncol(myPAtable)) myPAtable[sample(which(myPAtable[, i] == FALSE), 8000), i] = TRUE

myPAtable nrow(myPAtable) head(myPAtable) tail(myPAtable)

-------------------

-------------------- "Test Data"---- 20% Data presence and pseudo-absence

TestData <- 'Test'

myRespName <- 'zisp'

TestData <- read.csv('Test.csv')

head(TestData) tail(TestData) nrow(TestData)

evalmyResp <- as.numeric(TestData[,myRespName])

evalmyRespXY <- TestData[,c("longitude","latitude")]

---------------------------------------------------------

jar <- paste(system.file(package="dismo"), "maxent.jar", sep='')

----

-----------------RUN_1_Current !!!!

'

'

'

' Format Data with pseudo-absences : user.defined method

' split randomly (80% calibration & 20% evaluation) : user.defined method

------------------------------------------------------------------------------

myBiomodData_1 <- BIOMOD_FormatingData(resp.var = myResp.PA, eval.resp.var = evalmyResp, expl.var = layers1, eval.expl.var = layers1, resp.xy = myRespXY, eval.resp.xy = evalmyRespXY, resp.name = myRespName, filter.raster = T, PA.strategy = 'user.defined', PA.user.table = myPAtable)

myBiomodData_1

-----------------------------------------

------------

AllModels <- c('ANN', 'CTA', 'FDA', 'GAM.mgcv.bam', 'GBM', 'GLM', 'MARS', 'MAXENT', 'MAXNET', 'RF', 'SRE', 'XGBOOST')

myBiomodOption_1 <- bm_ModelingOptions(data.type = 'binary', models = AllModels, strategy = 'default', bm.format = myBiomodData_1, )

myBiomodOption_1

----------Bild Modeling Options

myBiomodModelOut_1 <- BIOMOD_Modeling(bm.format = myBiomodData_1, AllModels, models = c('MAXNET',"XGBOOST"), CV.strategy = "user.defined", CV.user.table = myCVtable, prevalence = 0.5, var.import = 5, OPT.strategy = 'default', metric.eval = c('TSS','ROC'), scale.models = F, CV.do.full.models = T, modeling.id = paste(myRespName,"Now",sep=""))

myBiomodModelOut_1

---------------

Another question I had, how should I define independent test data in the BIOMOD_Modeling section, CV.user.table = myCVtable, when I separated the training and testing data?

Now I run BIOMOD_Modeling, I encounter this error:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Checking Models arguments... Error in .fun_testIfInherits(TRUE, "models.pa", models.pa, "list") : models.pa must be a 'list' object

Best,

HeleneBlt commented 2 weeks ago

Hello there !

Which version of biomod2 did you use ? You can see it with the command sessionInfo(). On version 4.2-5-2 of biomod2, we have a little problem with the message ! No data has been set aside for modeling evaluation which appears when we check the PAtable independently of the evaluation data. We will correct it on the github version 😄 However, you can check your object myBiomodData_1: it must contain a slot with the evaluation data (or myBiomodData_1@has.data.eval must be TRUE).

The argument models.pa is used to define which pseudo-absence datasets are to be used for each algorithm. Here, as you have only one PA dataset, it is not really useful here. So you can remove it :

myBiomodModelOut_1 <- BIOMOD_Modeling(bm.format = myBiomodData_1,
models = c('MAXNET',"XGBOOST"),
CV.strategy = "user.defined",
CV.user.table = myCVtable,
prevalence = 0.5, var.import = 5,
OPT.strategy = 'default',
metric.eval = c('TSS','ROC'),
scale.models = F,
CV.do.full.models = T,
modeling.id = paste(myRespName,"Now",sep=""))

But if you want to use it later, your object must be a list. Example: AllModelsPA <- list("MAXNET" = "PA1", "XGBOOST" = "PA1")

Hope it helps ! Hélène