Closed rahulkgour closed 1 year ago
Hi Rahul, Thank you for reporting :pray: Unfortunately I did not manage to reproduce the issue yet. I did reproduce the modeling and projection but no projection had pixel greyed out. Although I must admit I have probably generated data - pseudo-absences and cross-validation table - differently than you did.
Regards, Rémi
Hi Rémi,
Thanks for your reply.
Its the same code I have mentioned above is used to generate the rasters, do you want the data preparation code as well?
I have shared the couple of those rasters through email as can't attach a .tif file here as the format of the file is not supported here.
Thanks & regards, Rahul
If nothing has changed since issue https://github.com/biomodhub/biomod2/issues/184, that is fine for the data preparation code.
No, nothing has changed since https://github.com/biomodhub/biomod2/issues/184.
Did you finally send the projected raster ? I did not get anything in my mail yet, so there might have been an issue somewhere.
Yes I have sent it earlier right after commenting here.
Now, I have sent it again.
Dear Rahul, Thank you for all the additional data and information :pray: I think I got what was happening. One of your layer have categorical information with levels ranging from 1 to 8 (LULC). At some points the levels got reformated from 0 to 7, which you can see with:
myExpl <- get_formal_data(myBiomodModel, subinfo = "expl.var")
levels(myExpl$category)
[1] "0" "1" "2" "3" "4" "5" "6" "7"
Therefore when using BIOMOD_Projection
with your original raster (labelled from 1 to 8), the models cannot predict on the 8, hence the missing data in your prediction rasters for GLM and RF.
Additionally, looking at get_built_models(myBiomodModel)
, I can see that only GLM
and RF
models were built. Hence the MAXENT
prediction rasters are probably obsolete one which did not have this issue (maybe because you used a different set of variables). I can imagine that MAXENT
was automatically deactivated if maxent.jar
was not found.
Concerning the issue of categorical raster having their level changing, I would suggest that you check (or share) the code that you use to load the raster. Categorical raster can be a bit cumbersome to manage, especially when mixing raster
and terra
packages (doing everything with terra
is generally better).
Regards, Rémi
Dear Rémi,
Thanks so much for your analysis and insights.
We actually loaded each raster individually from our working directory to avoid any issue and then created a stack, I have pasted the code below:
# import all the environmental rasters
LULC <- raster("LULC.tif")
Aspect <- raster("Aspect.tif")
DiffNDVI <- raster("diffNDVI.tif")
Dist_Road <- raster("Dist_Road.tif")
Dist_Settle <- raster("Dist_Settle.tif")
Elevation <- raster("elevation.tif")
Avg_Precip <- raster("Avg_Precip.tif")
TWI<- raster("TWI.tif")
Avg_WS<- raster("Avg_WS.tif")
Slope<- raster("Slope.tif")
Dist_Agri <- raster("Dist_Agri.tif")
Avg_RH <- raster("Avg_RH.tif")
# creating stack
rasters <- stack(LULC, NDVI, Dist_Agri, Dist_Settle, Dist_Road,
Avg_Precip, Avg_Temp, TWI,
Avg_VPD, Avg_WS, Aspect, Elevation, Slope, Avg_RH)
# Import occ points, blockCV
points <- read.csv(file.choose())
# We convert the `data.frame` to `sf` as follows:
pa_data <- sf::st_as_sf(points, coords = c("x", "y"), crs = 4326)
### Using `blockCV` in `biomod2` package
# species occurrences
DataSpecies <- points
# the name of studied species
myRespName <- "fireOcc"
# the presence/absences data for our species
myResp <- as.numeric(DataSpecies[,myRespName])
# the XY coordinates of species data
myRespXY <- DataSpecies[,c("x","y")]
# extract the raster values for the species points as a dataframe
RasterValues <- terra::extract(rasters, pa_data, df = TRUE, ID = FALSE)
myResp.PA <- ifelse(myResp == 1, 1, NA)
# Step 1. Defining pseudo absences
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp.PA,
expl.var = rasters,
resp.xy = myRespXY,
resp.name = myRespName,
PA.nb.rep = 1,
PA.nb.absences = 1000,
PA.strategy = 'random',
na.rm = TRUE)
# Step 2. Presence+pseudo-absence from biomod.formated.data object
myPoints <- cbind(myBiomodData@data.species, myBiomodData@coord)
colnames(myPoints) <- c("fireOcc","x","y")
pa_data <- sf::st_as_sf(myPoints, coords = c("x", "y"), crs = 4326)
# Step 3. Run crossvalidation
###### Generating block CV folds
# Here, we generate two CV strategy, one k-fold CV using `cv_spatial`
scv1 <- cv_spatial(
x = pa_data,
column = "fireOcc", # the response column (binary or multi-class)
r = rasters_WoLULC,
k = 5, # number of folds
size = 1000, # size of the blocks in meters
selection = "random", # random blocks-to-fold
iteration = 50, # find evenly dispersed folds
progress = FALSE, # turn off progress bar
biomod2 = TRUE, # also create folds for biomod2
raster_colors = terrain.colors(10, rev = TRUE) # options from cv_plot for a better colour contrast
)
DataSplitTable <- scv1$biomod_table
# myBiomodOption
myBiomodOption <-
BIOMOD_ModelingOptions(
GLM = list(type = 'quadratic',
interaction.level = 1,
myFormula = NULL,
test = 'BIC',
family = 'binomial',
control = glm.control(epsilon = 1e-08,
maxit = 1000,
trace = FALSE)),
CTA = list( method = 'class',
parms = 'default',
cost = NULL,
control = list(xval = 5,
minbucket = 5,
minsplit = 5,
cp = 0.001,
maxdepth = 25)
),
ANN = list( NbCV = 5,
size = NULL,
decay = NULL,
rang = 0.1,
maxit = 200),
FDA = list( method = 'mars',
add_args = NULL),
MARS = list( type = 'simple',
interaction.level = 0,
myFormula = NULL,
nk = NULL,
penalty = 2,
thresh = 0.001,
nprune = NULL,
pmethod = 'backward'),
RF = list(do.classif = TRUE,
ntree = 500,
mtry = 'default',
nodesize = 5,
maxnodes = NULL),
GBM = list(distribution = 'bernoulli',
n.trees = 3000,
interaction.depth = 7,
n.minobsinnode = 5,
shrinkage = 0.001,
bag.fraction = 0.5,
train.fraction = 1,
cv.folds = 3,
keep.data = FALSE,
verbose = FALSE,
perf.method = 'cv'),
MAXENT = list(path_to_maxent.jar = getwd(),
maximumiterations = 10,
visible = FALSE,
linear = TRUE,
quadratic = TRUE,
product = FALSE,
threshold = FALSE,
hinge = FALSE,
lq2lqptthreshold = 80,
betamultiplier = 1,
defaultprevalence = 0.5)
)
# Step 4. Model fitting
myBiomodModelOut <- BIOMOD_Modeling(myBiomodData,
models = c('RF', 'MAXENT', 'GBM', 'MARS', 'ANN', 'GLM'),
bm.options = myBiomodOption,
data.split.table = DataSplitTable, # blocking folds
var.import = 3,
metric.eval = c('TSS', 'ROC', 'ACCURACY'),
save.output=TRUE,
scale.models=FALSE,
do.full.models=FALSE,
modeling.id=paste(myRespName,"Run_Test",sep="")
)
# Step 5. Ensemble models
BIOMOD_ensemble_models <- BIOMOD_EnsembleModeling(
bm.mod = myBiomodModelOut,
models.chosen = 'all',
em.by = 'all',
metric.select = 'TSS',
metric.select.thresh = 0.4,
metric.eval = c('TSS', 'ROC', 'ACCURACY'),
var.import = 3,
em.algo = c('EMmean', 'EMwmean'),
EMci.alpha = 0.05,
EMwmean.decay = 'proportional'
)
# Step 6. Model Projections
BIOMOD_models_proj_current <- BIOMOD_Projection(bm.mod = myBiomodModelOut,
proj.name = 'Current_Test',
new.env = rasters,
models.chosen = 'all',
metric.binary = NULL,
metric.filter = NULL,
build.clamping.mask = FALSE,
do.stack = FALSE,
output.format = '.tif'
)
# Step 7. Ensemble Projection
myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = BIOMOD_ensemble_models,
bm.proj = BIOMOD_models_proj_current,
models.chosen = 'all',
metric.binary = NULL,
metric.filter = NULL,
build.clamping.mask = FALSE,
do.stack = FALSE,
output.format = '.tif')
Thanks & regards, Rahul
Dear Rahul, Thank you for the update and additional information :pray: I would suggest some tweaks to the beginning of your code:
# import all the environmental rasters
# terra uses rast instead of raster
LULC <- rast("LULC.tif")
Aspect <- rast("Aspect.tif")
DiffNDVI <- rast("diffNDVI.tif")
Dist_Road <- rast("Dist_Road.tif")
Dist_Settle <- rast("Dist_Settle.tif")
Elevation <- rast("elevation.tif")
Avg_Precip <- rast("Avg_Precip.tif")
TWI<- rast("TWI.tif")
Avg_WS<- rast("Avg_WS.tif")
Slope<- rast("Slope.tif")
Dist_Agri <- rast("Dist_Agri.tif")
Avg_RH <- rast("Avg_RH.tif")
# creating stack
# terra uses 'c' instead of 'stack'
# Note that some of the element were missing (Avg_Temp for instance)
rasters <- c(LULC, DiffNDVI, Dist_Agri, Dist_Settle, Dist_Road,
Avg_Precip, Avg_Temp, TWI,
Avg_VPD, Avg_WS, Aspect, Elevation, Slope, Avg_RH)
# it will be easier if you can name the layer.
# Note that layer name cannot have underscore character '_'
names(rasters) <- c("LULC", "DiffNDVI", "DistAgri", "DistSettle", "DistRoad",
"AvgPrecip", "AvgTemp", "TWI",
"AvgVPD", "AvgWS", "Aspect", "Elevation", "Slope", "AvgRH")
# Manage categorical raster
# Ensure that categories are properly managed. It is not automatic as categories were stored as numbers.
catLULC <- categories(rasters$LULC,
value = data.frame(ID = seq(1,8),
value = paste0("landuse", seq(1,8))))
rasters$LULC <- catLULC
If you run that, you should have an easier time running the model and interpreting results. Additionally, during the BIOMOD_FormatingData
you should see such kind of message:
> fact.level for LULC : 1:landuse1 2:landuse2 3:landuse3 4:landuse4 5:landuse5 6:landuse6 7:landuse7 8:landuse8
- according to mask.out levels 6 1 8 5 have already been sampled
- levels 2 3 4 7 will be sampled in the original raster
This indicates that the sampling of pseudo-absences ensure that all factors levels are sampled.
You may still encounter similar troubles if you are unlucky and some factors levels disappear from the calibration dataset through the crossvalidation, but I never encountered the problem on my end.
You should also check that unique(myBiomodData@data.env.var$LULC)
return values from the 8 levels.
Let me know if that do not solve your issue. Best, Rémi
Dear Rémi,
Thanks for your valuable suggestions.
I will try it and let you know about the run, hope this time it works well. Fingers crossed.
Thanks again for all your prompt support, Rahul
Dear Rémi,
I am writing to express my sincere appreciation for the help you provided us during the issues which we faced. Your analysis and support was truly valuable and helped us achieve our goals in a timely and efficient manner.
We are able to get the predicted rasters as we were expecting.
Once again, thank you for your assistance, your efforts are truly appreciated.
Thanks & regards, Rahul
We are recieving the rasters with No data pixels scattered all over the projected rasters. We see this behavior for the 'GBM', 'MARS', 'ANN', 'GLM', 'RF' algorithms. But for 'Maxent' we can see the predicted rasters without any issues.
Model fitting
Ensemble model
Projections
Thanks and regards, Rahul