Closed btupper closed 3 weeks ago
If I remember correctly, this is a hack to deal with the fact that, when running a workflow, datasets that are generated from a spatial_split
end up having a geometry
column even though they are not of class sf
. You could try to implement your change, but I think it would lead to failures if you run a Check (I am not sure whether there is a unit test that deals with that, or whether it would be caught by the full vignette when it maps the workflow).
However, even if I am correct, the current solution will fail on sf
objects that don't have a geometry column (as you point out). For the recipe
methods, the solution would be to implement a hybrid check (check for sf AND for a geometry column). What I am not sure is whether the workflow will then create datasets with a geometry
column, or whether the custom name would be adopted. I guess we would have to try that out.
Oh, I see. Yep - I knew it would melt my head.
May be a simple solution might be a simple heads-up to the user to name the geometry list column "geometry" ala obj = sf::st_set_geometry(obj, "geometry")
? I did bump into issues while trying to use a geometry list column named geom
, and it didn't complain until I tried workflow_set()
.
@btupper Can you have a look at the branch rename_geom
? It should now cope with sf
objects that have alternative names for the geometry
column.
Woooohooo! It works without issue after assigning a random name to the geometry list column. The example below borrows from your tutorial but substitutes in some local data. Thank you!
library(tidysdm)
#> Loading required package: tidymodels
#> Loading required package: spatialsample
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(here)
#> here() starts at /Users/ben/Library/CloudStorage/Dropbox/code/projects/mola-mola
model_input = sf::read_sf("model_input.gpkg") |>
sf::st_set_geometry("happy_halloween")
model_input
#> Simple feature collection with 1089 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -76 ymin: 35 xmax: -59.5 ymax: 46
#> Geodetic CRS: WGS 84
#> # A tibble: 1,089 × 6
#> class sst windspeed u_wind v_wind happy_halloween
#> <chr> <dbl> <dbl> <dbl> <dbl> <POINT [°]>
#> 1 presence 12.5 4.91 -0.270 -0.543 (-66.60798 42.2943)
#> 2 presence 20.2 4.86 1.59 0.756 (-68.00023 42.60569)
#> 3 presence 14.9 8.41 2.05 -0.739 (-72.66639 40.46282)
#> 4 presence 18.1 4.31 1.28 2.79 (-69.75882 42.53246)
#> 5 presence 16.7 8.38 0.864 -1.11 (-75.3774 36.2458)
#> 6 presence 27.8 7.50 -3.31 -1.33 (-75.08756 35.54985)
#> 7 presence 12.6 4.61 -0.155 -0.274 (-67.15288 42.30458)
#> 8 presence 16.6 4.08 0.846 0.505 (-71.46238 41.07934)
#> 9 presence 17.4 4.48 0.906 0.429 (-65.18623 42.52456)
#> 10 presence 17.5 4.15 -0.0541 0.253 (-69.61 41.55)
#> # ℹ 1,079 more rows
rec <- recipe(model_input, formula = class ~ .)
models <-
# create the workflow_set
workflow_set(
preproc = list(default = rec),
models = list(
# the standard glm specs
glm = tidysdm::sdm_spec_glm(),
# rf specs with tuning
rf = tidysdm::sdm_spec_rf(),
# boosted tree model (gbm) specs with tuning
gbm = tidysdm::sdm_spec_boost_tree(),
# maxent specs with tuning
maxent = tidysdm::sdm_spec_maxent()
),
# make all combinations of preproc and models,
cross = TRUE ) |>
# tweak controls to store information needed later to create the ensemble
option_add(control = tidysdm::control_ensemble_grid())
input_cv <- spatial_block_cv(data = model_input, v = 3, n = 5)
set.seed(1234567)
models <- models |>
workflow_map("tune_grid",
resamples = input_cv,
grid = 3,
metrics = tidysdm::sdm_metric_set(),
verbose = TRUE )
#> i No tuning parameters. `fit_resamples()` will be attempted
#> i 1 of 4 resampling: default_glm
#> ✔ 1 of 4 resampling: default_glm (264ms)
#> i 2 of 4 tuning: default_rf
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> ✔ 2 of 4 tuning: default_rf (2.3s)
#> i 3 of 4 tuning: default_gbm
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> ✔ 3 of 4 tuning: default_gbm (6s)
#> i 4 of 4 tuning: default_maxent
#> ✔ 4 of 4 tuning: default_maxent (1.9s)
Created on 2024-10-31 with reprex v2.1.1
@btupper thanks for double-checking. Now in dev
following #63
I'm trying to understand the context of the prep.spatial_recipe function. In particular, the
training
argument is being tested for a "geometry" column. This may lead to problems if the geometry list column is not named "geometry" - there is no requirement that the the geometry list column have any name in particular. In fact, when reading from the ubiquitous geopackage (".gpkg") formatted files it seems that the column is named "geom".The same test occurs in
bake.spatial_recipe
here but with anewdata
argument.Is the point to test that the
training
(ornewdata
) object is ansf
object? If so, wouldinherits(training, 'sf')
suffice? Maybe like this...On the other hand, it may be that
training
data is something I misunderstand. Perhaps it has been modified upstream to explicitly contain a "geometry" column? That would make my head melt, but would explain my confusion, too.