fstermann / mlr-mini

MIT License
0 stars 2 forks source link

Inducers #6

Open fstermann opened 1 year ago

fstermann commented 1 year ago

Tasks

Description

An "Inducer" is an algorithm that is used to learn a model or hypothesis from a training dataset. Inducers are the functions that do the actual model-fitting. They have configuration parameters ("hyperparameters") that can influence their functionality. An example for a hyperparameter is the nrounds argument given to xgboost.

mlr.mini should provide a collection of inducers. These should follow the naming scheme InducerXxx. However, it is convenient to have a central collection of all inducers that are available for mlr.mini. One should therefore also have an environment ind where inducers are entered as well. This way, other packages can extend mlr.mini by adding their own inducers.

Inducers are functions with an S3-class that have a nice printer and an implementation for the hyperparameters, configuration and configuration<- generics. Calling an Inducer with a named argument should change that configuration parameter (you could use ... here, but it would be nicer if the function has named arguments so that tab-completion works. Remember the metaprogramming homework on how to construct functions like this). Calling an Inducer with an unnamed argument (or argument named .data -- the . prevents a collision with the name of a hyperparameter) should create a model.


InducerXgboost
#> Inducer: XGBoost
#> Configuration: verbose = 0
identical(InducerXgboost, ind$xgboost)
#> [1] TRUE
class(ind$xgboost)
#> [2] "InducerXGBoost" "Inducer"

hyperparameters(ind$xgboost)
#> Hyperparameter Space:
#>                 name type    range
#> 1:               eta  dbl   [0, 1]
#> 2:           nrounds  dbl [1, Inf]
#> 3:         max_depth  dbl [0, Inf]
#> 4:  colsample_bytree  dbl   [0, 1]
#> 5: colsample_bylevel  dbl   [0, 1]
#> 6:            lambda  dbl [0, Inf]
#> 7:             alpha  dbl [0, Inf]
#> 8:         subsample  dbl   [0, 1]
#> 9:           verbose  int   [0, 2]

xgb <- ind$xgboost(nrounds = 20)
xgb
#> Inducer: XGBoost
#> Configuration: nrounds = 20, verbose = 0

configuration(xgb)
#> $nrounds
#> [1] 20
#> 
#> $verbose
#> [1] 0

configuration(xgb)$nrounds <- 10
xgb
#> Inducer: "XGBoost"
#> Configuration: nrounds = 10, verbose = 0

model.xgb <- xgb(cars.data)

model.xgb
#> Regression Model: "XGBoost" fitted on "cars" dataset.

class(model.xgb)
#> [2] "ModelXGBoost" "ModelRegression" "Model"