biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
85 stars 22 forks source link

Issues with BIOMOD_FormatingData #70

Closed CatherineBuckland closed 2 years ago

CatherineBuckland commented 2 years ago

Hi BIOMOD team, I have been using the biomod2 package and connected libraries for a couple of years, but suddenly when I went to run some of my scripts yesterday (after maybe not using them for a fortnight / month), it kept crashing during the BIOMOD_FormatingData stage. Previously I have had no issues with this at all - so I am not sure what has changed. After reading some of the other posts, I saw that there was a new BIOMOD2 v4.0 available - so I have now downloaded this, updated all the commands and scripts to the new formats - and got most of the code working - but still have the same error message with the BIOMOD_FormatingData stage. I have managed to isolate the error is associated with the PA.strategy selection, but I do not know how to fix it. Previously I have used PA.strategy = 'sre', but now this will not work. The script will run if I change this to 'random' or 'disk'. But I receive the below error when using 'sre' image Please can you advise what the issue is and how I need to tweak the script to allow it to run as it was previously.

Many thanks, Catherine

MayaGueguen commented 2 years ago

Hello Catherine,

Thank you for your efforts to update your biomod2 version ! :pray: In order to be able to help you, could you provide me your script and maybe your files ro reproduce this error ?

Thanks in advance ! Maya

CatherineBuckland commented 2 years ago

Hi Maya, Thanks for getting back to me so quickly - please find below an example script I've been using. If you change the PA.Strategy it works for 'disk' and 'random', but fails for 'sre'. I tried to attach the R script file but it failed, so I've just copied the script below. Any advice would be amazing! Thanks, Catherine

load libraries

library(biomod2) library(dismo) library(rJava) library(raster) library(CoordinateCleaner) library(dplyr) library(ggplot2) library(rgbif) library(sp) library(countrycode) library(CoordinateCleaner) library(rnaturalearth) library(rnaturalearthdata) library(tidyverse) library(rlang) library(rasterVis) library(lattice)

obtain data from GBIF via rgbif

z <- occ_data(scientificName = "Euphorbia tirucalli", basisOfRecord = 'HUMAN_OBSERVATION', limit = 10000, hasCoordinate = T) dat <- data.frame(z$data)

names(dat) #a lot of columns

length(dat)

select columns of interest

dat <- dat %>% dplyr::select(species, decimalLongitude, decimalLatitude, countryCode, gbifID, family, taxonRank, coordinateUncertaintyInMeters, year, basisOfRecord, institutionCode, datasetName)

remove records without coordinates

dat <- dat%>% filter(!is.na(decimalLongitude))%>% filter(!is.na(decimalLatitude))

plot data to get an overview

wm <- borders("world", colour="gray50", fill="gray75") ggplot()+ coord_fixed()+ wm + geom_point(data = dat, aes(x = decimalLongitude, y = decimalLatitude), colour = "darkred", size = 0.5)+ theme_bw()

convert country code from ISO2c to ISO3c

dat$countryCode <- countrycode(dat$countryCode, origin = 'iso2c', destination = 'iso3c')

flag problems

dat <- data.frame(dat) flags <- clean_coordinates(x = dat, lon = "decimalLongitude", lat = "decimalLatitude", countries = "countryCode", species = "species", tests = c("capitals", "duplicates", "seas", "centroids", "equal","gbif", "institutions", "zeros")) summary(flags) plot(flags, lon = "decimalLongitude", lat="decimalLatitude")

dat_cl <- dat[flags$.summary,]

wm <- borders("world", colour="gray50", fill="gray75") ggplot()+ coord_fixed()+ wm + geom_point(data = flags, aes(x = decimalLongitude, y = decimalLatitude), colour = "black", size = 0.5)+ geom_point(data = dat_cl, aes(x = decimalLongitude, y = decimalLatitude), colour = "yellow", size = 0.5)+ theme_bw()

head(dat_cl)

data_subset <- dat_cl[,c(2,3)]

data_subset$EupTir <-rep(1,nrow(data_subset))

head(data_subset) names(data_subset)[1] <- "X_WGS84" names(data_subset)[2] <- "Y_WGS84" head(data_subset)

the name of studied species

myRespName <- 'EupTir'

the presence/absences data for our species

myResp <- as.numeric(data_subset[,c(3)]) myResp

the XY coordinates of species data

myRespXY <- data_subset[,c("X_WGS84","Y_WGS84")]

EupTirext = extent(-180, 180, -90, 90) r <- raster(EupTirext) res(r) <- 1 myRespXYsel <- gridSample(myRespXY,r,n=5) myRespXYsel

rownum <- nrow(myRespXYsel) rownum

plot(myRespXY) points(myRespXYsel,cex=1,col='red',pch='x')

wm <- borders("world", colour="gray50", fill="gray75") ggplot()+ coord_fixed()+ wm + geom_point(data = myRespXYsel, aes(x = X_WGS84, y = Y_WGS84), colour = "darkgreen", size = 0.5)+ theme_bw()

myRespCut <-head(myResp,n=rownum)

load the WorldClim bioclim rasters and extract each desired raster layer

WorldClimData <- getData('worldclim',var='bio',res=2.5) bio_02 <- subset(WorldClimData,2) bio_06 <- subset(WorldClimData,6) bio_12 <- subset(WorldClimData,12) bio_15 <- subset(WorldClimData,15)

load any additional environmental indices

create FULL EXTENT predictor stacks

predictors_v1 <- stack(bio_02, bio_06, bio_12, bio_15)

plot(predictors_v1)

crop extent for projection if more focused on smaller area (e.g. Africa)

ext = extent(-22, 52, -38, 38) bio_02c <- crop(bio_02,ext) bio_06c <- crop(bio_06,ext) bio_12c <- crop(bio_12,ext) bio_15c <- crop(bio_15,ext)

partial extent predictor stacks

predictors_v1_crop <- stack(bio_02c, bio_06c, bio_12c, bio_15c)

plot(predictors_v1_crop)

SDM version 1

format the data and add background pseudo-absence data points

myBiomodData <- BIOMOD_FormatingData(resp.var = myRespCut, expl.var = predictors_v1_crop, resp.xy = myRespXYsel, resp.name = myRespName, eval.resp.var = NULL, eval.expl.var = NULL, eval.resp.xy = NULL, PA.nb.rep = 3, PA.nb.absences = rownum, PA.strategy='sre', PA.dist.min = 20000, PA.dist.max = NULL, PA.sre.quant = 0.025,

PA.user.table = NULL,

                                 na.rm = TRUE)

myBiomodData plot(myBiomodData)

2. Defining Models Options using default options.

myBiomodOption <- BIOMOD_ModelingOptions(MAXENT.Phillips=list(path_to_maxent.jar = 'C:/MAXENT/maxent.jar'))

3. Computing the models

myBiomodModelOut <- BIOMOD_Modeling( bm.format = myBiomodData, models = c('RF'), bm.options = myBiomodOption, nb.rep=3, data.split.perc=60, metric.eval = c('TSS','ROC'), var.import=3,

SaveObj = TRUE,

compress = 'xz',

rescal.all.models = FALSE,

do.full.models = FALSE, modeling.id = paste(myRespName,"FirstModeling",sep=""))

myBiomodModelOut

get all models evaluation

myBiomodModelEval_SDM1 <- get_evaluations(myBiomodModelOut)

print the dimnames of this object

dimnames(myBiomodModelEval_SDM1)

print out the TSS scores for Randon Forest as an example

myBiomodModelEval_SDM1["TSS","Testing.data","RF",,]

mean(myBiomodModelEval_SDM1["TSS","Testing.data","RF",,])

myBiomodModelEval_SDM1["ROC","Testing.data","RF",,]

mean(myBiomodModelEval_SDM1["ROC","Testing.data","RF",,])

print variable importances

get_variables_importance(myBiomodModelOut,str.grep = 'RF') get_built_models(myBiomodModelOut)

calculate average variable importance metrics first by algorithm type and then by variable

variables_imp <- get_variables_importance(myBiomodModelOut) dimnames(variables_imp)

variables_imp[,,] mean(variables_imp["bio2",,]) mean(variables_imp["bio6",,]) mean(variables_imp["bio12",,]) mean(variables_imp["bio15",,])

building an ensemble model, excluding all models with TSS score < 0.7

myBiomodEM_SDM1 <- BIOMOD_EnsembleModeling( myBiomodModelOut, models.chosen = 'all', em.by='all', metric.select = 'all', metric.select.thresh = c(0.4,0.7), metric.eval = c('TSS','ROC'), prob.mean = F, prob.mean.weight = T, prob.cv = T, prob.ci = F, prob.ci.alpha = 0.05, prob.median = F, committee.averaging = F, prob.mean.weight.decay = 'proportional' )

print summary

myBiomodEM_SDM1

get evaluation scores

get_evaluations(myBiomodEM_SDM1) get_built_models(myBiomodEM_SDM1) get_kept_models(myBiomodEM_SDM1, model=1)

projection over the globe under current conditions

myBiomodProj_SDM1 <- BIOMOD_Projection( myBiomodModelOut, new.env = predictors_v1_crop, proj.name = 'Student_trial', selected.models = 'all', binary.meth = 'TSS', compress = 'xz', build.clamping.mask = F, output.format = '.img')

summary of created object

myBiomodProj_SDM1

make some plots sub-selected by str.grep argument

plot(myBiomodProj_SDM1, str.grep = 'RF')

4. Creating the ensemble projections

myBiomodEF_SDM1 <- BIOMOD_EnsembleForecasting( myBiomodEM_SDM1, bm.proj = myBiomodProj_SDM1, metric.binary = 'TSS', output.format = ".img")

plot(myBiomodEF_SDM1)

CatherineBuckland commented 2 years ago

Hi Maya, Just wondered if you had had any luck working out the issue with the PA strategy selection options? Any advice would be great! Thanks, Catherine

MayaGueguen commented 2 years ago

Hello Catherine,

I apologize for the delay, some other stuff went on the way...

Thank you so much for sharing your code :pray: I just made a commit with the correction :heavy_check_mark: Please, do not hesitate if something is still going wrong.

Maya