# Reduce to a dataset of 150 observations to speed up model fitting.
train_obs = sample(nrow(data), 150)
# X is our training sample.
x_train = data[train_obs, ]
# Create a holdout set for evaluating model performance.
# Note: cross-validation is even better than a single holdout sample.
x_holdout = data[-train_obs, ]
# Create a binary outcome variable: towns in which median home value is > 22,000.
outcome_bin = as.numeric(outcome > 22)
y_train = outcome_bin[train_obs]
y_holdout = outcome_bin[-train_obs]
Then all you have to do to run a simple model is:
sl = SuperLearner(Y = y_train, X = x_train, family = binomial(),method='method.AUC',
SL.library = c("SL.mean", "SL.glmnet","SL.ranger"),)
sl
Highly recommending this great new package:
https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html!! You can run all your great models in one beautiful line of code. Assuming you have a training dataset that looks like this:
Then all you have to do to run a simple model is: