fstermann / mlr-mini

MIT License
0 stars 2 forks source link

Datasets #3

Open fstermann opened 1 year ago

fstermann commented 1 year ago

Tasks

Description

Different learning algorithms use datasets in different formats (e.g. matrix or data.frame) and require different ways of specifying the response / target / outcome data (e.g. through a formula, or by having a distinct label or y argument).

The data, and which part of it should be predicted, is often closely linked. We will therefore collect this information in one object, which we call Dataset.

cars.data <- Dataset(data = cars, target = "dist")
print(cars.data)
#> Dataset "cars", predicting "dist" (Regression)
#>      dist speed
#>     <num> <num>
#>  1:     2     4
#>  2:    10     4
#> ---            
#> 49:   120    24
#> 50:    85    25
class(cars.data)
#> [1] "DatasetRegression" "Dataset"

cars.data[c(1, 2, 3, 4), ]
#> Dataset "cars", predicting "dist" (Regression)
#>      dist speed
#>     <num> <num>
#> 1:      4    2
#> 2:      4   10
#> 3:      7    4
#> 4:      7   22

cars.data[c(1, 2), "dist"]
#> Dataset "cars", predicting "dist" (Regression)
#>      dist
#>     <num>
#> 1:      4
#> 2:      4

cars.data[, "cars"]
#> Error: Cannot remove target column "dist"

metainfo(cars.data)
#> $features
#> speed 
#> "num" 
#> 
#> $targets
#>  dist 
#> "num" 
#> 
#> $nrow
#> [1] 50
#> 
#> $type
#> [1] "regression"
#> 
#> $missings
#> [1] FALSE
#> 
#> attr(,"class")
#> [1] "DatasetInfo"

Dataset() should have arguments data and target, as well as an optional argument type (one of "regression", "classification") and an optional argument "name", defaulting to as.name(deparse(substitute(data), 20)[[1]]).

m-muecke commented 1 year ago

Following arguments: