Closed rylielrobinson closed 1 month ago
Hi @rylielrobinson, could you please provide us with a reprex (minimalistic dataset and code) that shows the issues, and we can then look into it. If you have dummy variables for the categorical variable, I am not clear why predict_raster()
would not work.
HI @dramanica, thanks for the response. I found that I was able to get terra::predict()
to work but still no luck with predict_raster()
. I am glad this approach using terra::predict()
works - I initially was not sure a _simpleensemble obj was compatible.
Here is a reproducible code example of what I have working:
library(tidymodels)
library(tidysdm)
library(raster)
library(terra)
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
set.seed(50)
pnts <- spatSample(v, size = 50)
pnts_r <- terra::extract(r, pnts, method = "simple")
pnts_r$pa <- sample(c("presence","absence"), nrow(pnts_r), replace = TRUE)
pnts_r$pa <- factor(pnts_r$pa, levels = c("presence", "absence"))
pnts_r <- pnts_r %>% dplyr::select(-ID)
seasons <- c('winter', 'spring', 'summer', 'fall')
pnts_r$season <- seasons[seq_len(nrow(pnts_r)) %% length(seasons) + 1]
#tidymodels recipe
pa_recipe <-
recipe(pa ~ .,
data = pnts_r) %>%
step_dummy(season) %>%
step_normalize(all_numeric())
pa_recipe
pnts_r %>% check_sdm_presence(pa) # MUST BE TRUE
pa_models <-
workflow_set(
preproc = list(default = pa_recipe),
models = list(
glm = sdm_spec_glm(), #no tuning
rf = sdm_spec_rf(tune = 'all'), #tuning
gbm = sdm_spec_boost_tree(tune = 'all'), #tuning
maxent = sdm_spec_maxent(tune = 'all') #tuning
),
cross = TRUE # make all combinations of preproc and models,
) %>%
option_add(control = control_ensemble_grid())
set.seed(123)
pa_split <- initial_split(pnts_r,
prop = 0.8)
pa_split
pa_cv <- vfold_cv(training(pa_split), v = 5, repeats = 2)
pa_cv # 5-fold cross-validation repeated 2 times using stratification
set.seed(1234567)
pa_models <-
pa_models %>%
workflow_map("tune_grid",
resamples = pa_cv, grid = 9,
metrics = sdm_metric_set(), verbose = TRUE)
autoplot(pa_models)
pa_models %>% collect_metrics()
pa_ensemble <- simple_ensemble() %>%
add_member(pa_models, metric = "roc_auc")
autoplot(pa_ensemble)
test <- terra::predict(r, pa_ensemble, na.rm = T, const=data.frame(season="spring"))
test #this works!
plot(test)
However, for tidysdm
while I still try:
test <- predict_raster(pa_ensemble, r)
There is the following error:
Error in `validate_column_names()`: ! The following required columns are missing: 'season'.
I was able to find a work around by replacing seasons as a numeric value and supplementing it into a SpatRaster stack for predict_raster()
though I am unsure if that is an appropriate work around. I am creating a 'pseudo' empty raster to match the season (ie. winter = 1).
library(raster)
library(terra)
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
set.seed(50)
pnts <- spatSample(v, size = 50)
pnts_r <- terra::extract(r, pnts, method = "simple")
#pnts_r$temp <- sample(0:25, 50, replace = TRUE)
pnts_r$pa <- sample(c("presence","absence"), nrow(pnts_r), replace = TRUE)
pnts_r$pa <- factor(pnts_r$pa, levels = c("presence", "absence"))
pnts_r <- pnts_r %>% dplyr::select(-ID)
seasons <- 1:4
pnts_r$season <- seasons[seq_len(nrow(pnts_r)) %% length(seasons) + 1]
pnts_r$season <- factor(pnts_r$season, levels = 1:4)
#tidymodels recipe
pa_recipe <-
recipe(pa ~ .,
data = pnts_r) %>%
step_dummy(season) %>%
step_normalize(all_numeric())
pa_recipe
pnts_r %>% check_sdm_presence(pa)
pa_models <-
workflow_set(
preproc = list(default = pa_recipe),
models = list(
glm = sdm_spec_glm(), #no tuning
#gam = sdm_spec_gam(), #no tuning
rf = sdm_spec_rf(tune = 'all'), #tuning
gbm = sdm_spec_boost_tree(tune = 'all'), #tuning
maxent = sdm_spec_maxent(tune = 'all') #tuning
),
cross = TRUE # make all combinations of preproc and models,
) %>%
option_add(control = control_ensemble_grid())
library(tidysdm)
set.seed(123)
pa_split <- initial_split(pnts_r,
prop = 0.8)
pa_split
pa_cv <- vfold_cv(training(pa_split), v = 5, repeats = 2)
pa_cv # 5-fold cross-validation repeated 2 times using stratification
set.seed(1234567)
pa_models <-
pa_models %>%
workflow_map("tune_grid",
resamples = pa_cv, grid = 10,
metrics = sdm_metric_set(), verbose = TRUE)
beepr::beep(2)
pa_models
autoplot(pa_models)
pa_ensemble <- simple_ensemble() %>%
add_member(pa_models, metric = "boyce_cont")
autoplot(pa_ensemble)
#tidysdm::predict_raster() approach - use 'empty' numeric raster for season num
s <- r
values(s) <- 1
s <- terra::mask(s,r)
stack <- c(r,s)
plot(stack)
names(stack) <- c("elevation","season")
test1 <- predict_raster(pa_ensemble, stack)
plot(test1)
To predict a factor that you transformed into a dummy variable in your recipe, you simply need to have a raster with the same categorical variable. So, if we use the ensemble with a 3 level topography variable as shown in the example at the end of https://evolecolgroup.github.io/tidysdm/articles/a2_tidymodels_additions.html#using-multi-level-factors-as-predictors, I can simply write:
# load the raster
climate_present <- terra::readRDS(system.file("extdata/lacerta_climate_present_10m.rds",
package = "tidysdm"))
# first we add the categorical topography variable to the climate data (using the same rules used in the original data.frame)
# most of the time, you wouldn't need this, as the variable would already be in your input file
climate_present$topography <- climate_present$altitude
climate_present$topography <- classify(climate_present$topography, rcl = c(-Inf, 200, 800, Inf), include.lowest=TRUE, brackets=TRUE)
levels(climate_present$topography) <- data.frame(ID = c(0,1,2), topography = c("plains", "hills", "mountains"))
# now we can predict
predict_factor <- predict_raster(lacerta_ensemble, climate_present)
plot(predict_factor)
You need to make sure that your variable is categorical (a factor) both in the data.frame passed to the recipe with step_dummy()
, and in the raster, and that the name of the variable (and levels) match. Generally, you would have that categorical variable in a raster and then sample from it, so everything should be in sync.
Let me know if you encounter any further problem.
@rylielrobinson Is this issue resolved? If so, we can close it?
@dramanica this approach has worked for the reprex data - thanks
Hello - I am interesting in using the tidysdm package as the basis for my ensemble ML modelling for a pres/abs study. I have had success with the pipeline up until predicting my simple_ensemble across a raster stack.
I consider seasons as a categorical (now dummy) variable in my workflow. All other predictors are numeric, which I have rasters for.
Before going the ensemble route, I created a ranger (random forest) object and predict it across a raster stack using
terra::predict
. When I wanted to predict my ranger model across the spatial extent I was successful using the following code:However, the const argument does not seem available for the
tidysdm::predict_raster
function for a simple_ensemble object. I have tried (unsuccessfully) to convert seasons to numeric values in hopes to supplement the season as a empty raster in the stack to no avail.Trying to predict the simple_ensemble using
terra::predict
no longer works eitherAs a
terra::predict
argument const is a 'data.frame. Can be used to add a constant value as a predictor variable so that you do not need to make a SpatRaster layer for it'.I am interested to know if something like this can be applied for
tidysdm
simple_ensemble objects for spatial/raster predictions?