biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
87 stars 22 forks source link

NoData pixels in predicted rasters for all the algorithms other than Maxent #188

Closed rahulkgour closed 1 year ago

rahulkgour commented 1 year ago

We are recieving the rasters with No data pixels scattered all over the projected rasters. We see this behavior for the 'GBM', 'MARS', 'ANN', 'GLM', 'RF' algorithms. But for 'Maxent' we can see the predicted rasters without any issues.

Model fitting

myBiomodModelOut <- BIOMOD_Modeling(myBiomodData,
                                    models = c('MAXENT', 'GBM', 'MARS', 'ANN', 'GLM', 'RF'),
                                    bm.options = myBiomodOption,
                                    data.split.table = DataSplitTable, # blocking folds
                                    var.import = 3,
                                    metric.eval = c('TSS', 'ROC', 'ACCURACY'),
                                    save.output=TRUE,
                                    scale.models=FALSE,
                                    do.full.models=FALSE,
                                    modeling.id=paste(myRespName,"Run20",sep="")
                                    )

Ensemble model

BIOMOD_models_proj_current <- BIOMOD_Projection(bm.mod = myBiomodModelOut,
                                                proj.name = 'Current',
                                                new.env = rasters,
                                                models.chosen = 'all',
                                                metric.binary = NULL,
                                                metric.filter = NULL,
                                                build.clamping.mask = FALSE,
                                                do.stack = FALSE,
                                                output.format = '.tif'
                                                )

Projections

myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = BIOMOD_ensemble_models,
                                             bm.proj = BIOMOD_models_proj_current,
                                             models.chosen = 'all',
                                             metric.binary = NULL,
                                             metric.filter = NULL,
                                             build.clamping.mask = FALSE,
                                             do.stack = FALSE,
                                             output.format = '.tif')

Thanks and regards, Rahul

rpatin commented 1 year ago

Hi Rahul, Thank you for reporting :pray: Unfortunately I did not manage to reproduce the issue yet. I did reproduce the modeling and projection but no projection had pixel greyed out. Although I must admit I have probably generated data - pseudo-absences and cross-validation table - differently than you did.

  1. Could you share the output raster so that I may have a look at the pattern that you observe ?
  2. Can you share the code you used to generate the dataset ? Maybe there is something in there that I did not reproduce.

Regards, Rémi

rahulkgour commented 1 year ago

Hi Rémi,

Thanks for your reply.

Its the same code I have mentioned above is used to generate the rasters, do you want the data preparation code as well?

I have shared the couple of those rasters through email as can't attach a .tif file here as the format of the file is not supported here.

Thanks & regards, Rahul

rpatin commented 1 year ago

If nothing has changed since issue https://github.com/biomodhub/biomod2/issues/184, that is fine for the data preparation code.

rahulkgour commented 1 year ago

No, nothing has changed since https://github.com/biomodhub/biomod2/issues/184.

rpatin commented 1 year ago

Did you finally send the projected raster ? I did not get anything in my mail yet, so there might have been an issue somewhere.

rahulkgour commented 1 year ago

Yes I have sent it earlier right after commenting here.

Now, I have sent it again.

Screenshot 2023-02-22 at 12 12 27
rpatin commented 1 year ago

Dear Rahul, Thank you for all the additional data and information :pray: I think I got what was happening. One of your layer have categorical information with levels ranging from 1 to 8 (LULC). At some points the levels got reformated from 0 to 7, which you can see with:

myExpl <- get_formal_data(myBiomodModel, subinfo = "expl.var")
levels(myExpl$category)
[1] "0" "1" "2" "3" "4" "5" "6" "7"

Therefore when using BIOMOD_Projection with your original raster (labelled from 1 to 8), the models cannot predict on the 8, hence the missing data in your prediction rasters for GLM and RF.

Additionally, looking at get_built_models(myBiomodModel), I can see that only GLM and RF models were built. Hence the MAXENT prediction rasters are probably obsolete one which did not have this issue (maybe because you used a different set of variables). I can imagine that MAXENT was automatically deactivated if maxent.jar was not found.

Concerning the issue of categorical raster having their level changing, I would suggest that you check (or share) the code that you use to load the raster. Categorical raster can be a bit cumbersome to manage, especially when mixing raster and terra packages (doing everything with terra is generally better).

Regards, Rémi

rahulkgour commented 1 year ago

Dear Rémi,

Thanks so much for your analysis and insights.

We actually loaded each raster individually from our working directory to avoid any issue and then created a stack, I have pasted the code below:

# import all the environmental rasters
LULC <- raster("LULC.tif")
Aspect <- raster("Aspect.tif")
DiffNDVI <- raster("diffNDVI.tif")
Dist_Road <- raster("Dist_Road.tif")
Dist_Settle <- raster("Dist_Settle.tif")
Elevation <- raster("elevation.tif")
Avg_Precip <- raster("Avg_Precip.tif")
TWI<- raster("TWI.tif")
Avg_WS<- raster("Avg_WS.tif")
Slope<- raster("Slope.tif")
Dist_Agri <- raster("Dist_Agri.tif")
Avg_RH <- raster("Avg_RH.tif")

# creating stack
rasters <- stack(LULC, NDVI, Dist_Agri, Dist_Settle, Dist_Road, 
                 Avg_Precip, Avg_Temp, TWI,
                 Avg_VPD, Avg_WS, Aspect, Elevation, Slope, Avg_RH)

# Import occ points, blockCV
points <- read.csv(file.choose())

# We convert the `data.frame` to `sf` as follows:
pa_data <- sf::st_as_sf(points, coords = c("x", "y"), crs = 4326)

### Using `blockCV` in `biomod2` package
# species occurrences
DataSpecies <- points
# the name of studied species
myRespName <- "fireOcc"
# the presence/absences data for our species
myResp <- as.numeric(DataSpecies[,myRespName])
# the XY coordinates of species data
myRespXY <- DataSpecies[,c("x","y")]
# extract the raster values for the species points as a dataframe
RasterValues <- terra::extract(rasters, pa_data, df = TRUE, ID = FALSE)
myResp.PA <- ifelse(myResp == 1, 1, NA)

# Step 1. Defining pseudo absences
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp.PA,
                                     expl.var = rasters,
                                     resp.xy = myRespXY,
                                     resp.name = myRespName,
                                     PA.nb.rep = 1,
                                     PA.nb.absences = 1000,
                                     PA.strategy = 'random',
                                     na.rm = TRUE)

# Step 2. Presence+pseudo-absence from biomod.formated.data object
myPoints <- cbind(myBiomodData@data.species, myBiomodData@coord)
colnames(myPoints) <- c("fireOcc","x","y")
pa_data <- sf::st_as_sf(myPoints, coords = c("x", "y"), crs = 4326)

# Step 3. Run crossvalidation
###### Generating block CV folds
# Here, we generate two CV strategy, one k-fold CV using `cv_spatial` 
scv1 <- cv_spatial(
  x = pa_data,
  column = "fireOcc", # the response column (binary or multi-class)
  r = rasters_WoLULC,
  k = 5, # number of folds
  size = 1000, # size of the blocks in meters
  selection = "random", # random blocks-to-fold
  iteration = 50, # find evenly dispersed folds
  progress = FALSE, # turn off progress bar
  biomod2 = TRUE, # also create folds for biomod2
  raster_colors = terrain.colors(10, rev = TRUE) # options from cv_plot for a better colour contrast
)

DataSplitTable <- scv1$biomod_table

# myBiomodOption 
myBiomodOption <- 
  BIOMOD_ModelingOptions(
    GLM = list(type = 'quadratic', 
               interaction.level = 1,
               myFormula = NULL,
               test = 'BIC',
               family = 'binomial',
               control = glm.control(epsilon = 1e-08,
                                     maxit = 1000,
                                     trace = FALSE)),
    CTA = list( method = 'class',
                parms = 'default',
                cost = NULL,
                control = list(xval = 5,
                               minbucket = 5,
                               minsplit = 5, 
                               cp = 0.001, 
                               maxdepth = 25)
    ),
    ANN = list( NbCV = 5,
                size = NULL,
                decay = NULL,
                rang = 0.1,
                maxit = 200),
    FDA = list( method = 'mars',
                add_args = NULL),
    MARS = list( type = 'simple',
                 interaction.level = 0,
                 myFormula = NULL,
                 nk = NULL,
                 penalty = 2,
                 thresh = 0.001,
                 nprune = NULL,
                 pmethod = 'backward'),
    RF = list(do.classif = TRUE,
              ntree = 500,
              mtry = 'default',
              nodesize = 5,
              maxnodes = NULL),
    GBM = list(distribution = 'bernoulli',
               n.trees = 3000,
               interaction.depth = 7,
               n.minobsinnode = 5,
               shrinkage = 0.001,
               bag.fraction = 0.5,
               train.fraction = 1,
               cv.folds = 3,
               keep.data = FALSE,
               verbose = FALSE,
               perf.method = 'cv'),
    MAXENT = list(path_to_maxent.jar = getwd(),
                           maximumiterations = 10,
                           visible = FALSE,
                           linear = TRUE,
                           quadratic = TRUE,
                           product = FALSE,
                           threshold = FALSE,
                           hinge = FALSE,
                           lq2lqptthreshold = 80,
                           betamultiplier = 1,
                           defaultprevalence = 0.5)
  )

# Step 4. Model fitting
myBiomodModelOut <- BIOMOD_Modeling(myBiomodData,
                                    models = c('RF', 'MAXENT', 'GBM', 'MARS', 'ANN', 'GLM'),
                                    bm.options = myBiomodOption,
                                    data.split.table = DataSplitTable, # blocking folds
                                    var.import = 3,
                                    metric.eval = c('TSS', 'ROC', 'ACCURACY'),
                                    save.output=TRUE,
                                    scale.models=FALSE,
                                    do.full.models=FALSE,
                                    modeling.id=paste(myRespName,"Run_Test",sep="")
                                    )

# Step 5. Ensemble models
BIOMOD_ensemble_models <- BIOMOD_EnsembleModeling(
                                                  bm.mod = myBiomodModelOut,
                                                  models.chosen = 'all',
                                                  em.by = 'all',
                                                  metric.select = 'TSS',
                                                  metric.select.thresh = 0.4,
                                                  metric.eval = c('TSS', 'ROC', 'ACCURACY'),
                                                  var.import = 3,
                                                  em.algo = c('EMmean', 'EMwmean'),
                                                  EMci.alpha = 0.05,
                                                  EMwmean.decay = 'proportional'
                                                  )

# Step 6. Model Projections
BIOMOD_models_proj_current <- BIOMOD_Projection(bm.mod = myBiomodModelOut,
                                                proj.name = 'Current_Test',
                                                new.env = rasters,
                                                models.chosen = 'all',
                                                metric.binary = NULL,
                                                metric.filter = NULL,
                                                build.clamping.mask = FALSE,
                                                do.stack = FALSE,
                                                output.format = '.tif'
                                                )

# Step 7. Ensemble Projection
myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = BIOMOD_ensemble_models,
                                             bm.proj = BIOMOD_models_proj_current,
                                             models.chosen = 'all',
                                             metric.binary = NULL,
                                             metric.filter = NULL,
                                             build.clamping.mask = FALSE,
                                             do.stack = FALSE,
                                             output.format = '.tif')

Thanks & regards, Rahul

rpatin commented 1 year ago

Dear Rahul, Thank you for the update and additional information :pray: I would suggest some tweaks to the beginning of your code:

# import all the environmental rasters
# terra uses rast instead of raster
LULC <- rast("LULC.tif")
Aspect <- rast("Aspect.tif")
DiffNDVI <- rast("diffNDVI.tif")
Dist_Road <- rast("Dist_Road.tif")
Dist_Settle <- rast("Dist_Settle.tif")
Elevation <- rast("elevation.tif")
Avg_Precip <- rast("Avg_Precip.tif")
TWI<- rast("TWI.tif")
Avg_WS<- rast("Avg_WS.tif")
Slope<- rast("Slope.tif")
Dist_Agri <- rast("Dist_Agri.tif")
Avg_RH <- rast("Avg_RH.tif")

# creating stack 
# terra uses 'c' instead of 'stack'
# Note that some of the element were missing (Avg_Temp for instance)
rasters <- c(LULC, DiffNDVI, Dist_Agri, Dist_Settle, Dist_Road,
                 Avg_Precip, Avg_Temp, TWI,
                 Avg_VPD, Avg_WS, Aspect, Elevation, Slope, Avg_RH)

# it will be easier if you can name the layer.
# Note that layer name cannot have underscore character '_'
names(rasters) <- c("LULC", "DiffNDVI", "DistAgri", "DistSettle", "DistRoad",
                    "AvgPrecip", "AvgTemp", "TWI",
                    "AvgVPD", "AvgWS", "Aspect", "Elevation", "Slope", "AvgRH")

# Manage categorical raster 
# Ensure that categories are properly managed. It is not automatic as categories were stored as numbers.
catLULC <- categories(rasters$LULC,
                      value = data.frame(ID = seq(1,8),
                                         value = paste0("landuse", seq(1,8))))
rasters$LULC <- catLULC

If you run that, you should have an easier time running the model and interpreting results. Additionally, during the BIOMOD_FormatingData you should see such kind of message:

> fact.level for LULC :  1:landuse1 2:landuse2  3:landuse3  4:landuse4  5:landuse5  6:landuse6  7:landuse7  8:landuse8
     - according to mask.out levels 6 1 8 5 have already been sampled
     - levels 2 3 4 7 will be sampled in the original raster

This indicates that the sampling of pseudo-absences ensure that all factors levels are sampled. You may still encounter similar troubles if you are unlucky and some factors levels disappear from the calibration dataset through the crossvalidation, but I never encountered the problem on my end. You should also check that unique(myBiomodData@data.env.var$LULC) return values from the 8 levels.

Let me know if that do not solve your issue. Best, Rémi

rahulkgour commented 1 year ago

Dear Rémi,

Thanks for your valuable suggestions.

I will try it and let you know about the run, hope this time it works well. Fingers crossed.

Thanks again for all your prompt support, Rahul

rahulkgour commented 1 year ago

Dear Rémi,

I am writing to express my sincere appreciation for the help you provided us during the issues which we faced. Your analysis and support was truly valuable and helped us achieve our goals in a timely and efficient manner.

We are able to get the predicted rasters as we were expecting.

Once again, thank you for your assistance, your efforts are truly appreciated.

Thanks & regards, Rahul