geocompx / geocompr

Geocomputation with R: an open source book
https://r.geocompx.org/
Other
1.59k stars 585 forks source link

Error when replicating chapter 12 #1110

Closed martinmacias closed 2 months ago

martinmacias commented 2 months ago

Hello.

When trying to replicate step by step chapter 12 section: 12.5.2 Spatial tuning of machine-learning hyperparameters, I found the following error.

Warning: The fallback learner 'classif.featureless' and the base learner 'classif.ksvm.tuned' have different predict types: 'response' != 'prob'.
ERROR [15:20:14.723] [mlr3] train: Assertion on 'method' failed: Must be a subset of {'none','try','evaluate','callr'}, not 'NULL'.   

I provide the code I am using below. I have already tried multiple things such as changind the encapsulate parameter in the learner and using another package that uses SVM and nothing has worked. I am fairly new to the use of the ml3r environment, so I may be missing something before the replication of the chapter itself.

R Session Info:

─ Session info ───────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       macOS Sonoma 14.4
 system   aarch64, darwin20
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Detroit
 date     2024-09-17
 rstudio  2024.04.2+764 Chocolate Cosmos (desktop)
 pandoc   NA
# 4. SVM
# Load data
# Three datasets: lsl dataframe identifying landslide occurrence and non-occurrence, study_mask
data("lsl", "study_mask", package = "spDataLarge")
ta <- terra::rast(system.file("raster/ta.tif", package = "spDataLarge"))

## Task from GLM
task = mlr3spatiotempcv::as_task_classif_st(x = as_data_backend(data = lsl), # Dataset that includes the response and predictor variables
                                            target = "lslpts",  # Name of the response variable  
                                            id = "ecuador_lsl", # This id is a new variable to identify the task, not the id of the dataset
                                            positive = "TRUE", # Value of the response variable that indicates the occurrence of an event
                                            coordinate_names = c("x", "y"), # Coordinates in the dataset
                                            crs = "EPSG:32717", # Coordinate reference system
                                            coords_as_features = FALSE # Indicate whether the coordinates are going to be used as predictors
                                            ) 

## 4.1 Learner
# From the kernellab package
lrn_ksvm = mlr3::lrn("classif.ksvm", 
                     predict_type = "prob", 
                     kernel = "rbfdot",
                     type = "C-svc")

# Fallback learner
lrn_ksvm$fallback = lrn("classif.featureless", predict_type = "prob")

## 4.2 Resampling strategy
# Performance estimation level
#resampling = mlr3::rsmp(.key = "repeated_spcv_coords", folds = 5, repeats = 100)
perf_level = mlr3::rsmp("repeated_spcv_coords", folds = 5, repeats = 100)

## 4.3 Hyperparameter tunning
## Five spatially disjoint partitions. 
tune_level = mlr3::rsmp("spcv_coords", folds = 5)

# Define the outer limits of the randomly selected hyperparameters. Search space
search_space = paradox::ps(
  C = paradox::p_dbl(lower = -12, upper = 15, trafo = function(x) 2^x),
  sigma = paradox::p_dbl(lower = -15, upper = 6, trafo = function(x) 2^x)
)

# Use 50 randomly selected hyperparameters
terminator = mlr3tuning::trm("evals", 
                             n_evals = 50)

tuner = mlr3tuning::tnr("random_search")

## 4.4 Modify learner with tunning hyperparameters
# Modify the learner lrn_ksvm in accordance with all the characteristics defining the hyperparameter tuning with auto_tuner().

at_ksvm = mlr3tuning::auto_tuner(
  learner = lrn_ksvm,
  resampling = tune_level,
  measure = mlr3::msr("classif.auc"),
  search_space = search_space,
  terminator = terminator,
  tuner = tuner
)

## 4.5 Parallelization
library(future)

## 4.6 Resampling spatially nested CV
# execute the outer loop sequentially and parallelize the inner loop
future::plan(list("sequential", "multisession"), 
             workers = floor(availableCores() / 2))

progressr::with_progress(expr = {
  rr_spcv_svm = mlr3::resample(task = task,
                               learner = at_ksvm, 
                               # outer resampling (performance level)
                               resampling = perf_level,
                               store_models = FALSE,
                               encapsulate = "evaluate")
})

# stop parallelization
future:::ClusterRegistry("stop")
Robinlovelace commented 2 months ago

Any ideas @jannes-m ?

jannes-m commented 2 months ago

Hey @martinmacias, I cannot reproduce the error. Is it possible that you need to update the mlr3 packages? In any case, a reproducible example with reprex::reprex(session_info = TRUE) including the session info would be helpful.

jannes-m commented 2 months ago

It seems that for now it is safer to use mlr3 0.20.2 (install.packages("mlr3") and mlr3extralearners 0.9.0 (remotes::install_github("mlr-org/mlr3extralearners@v0.9.0")). There might be still some problems when using the most recent github versions.