ecpolley / SuperLearner

Current version of the SuperLearner R package
271 stars 72 forks source link

Package `xgboost` update deprecates `reg:linear` #132

Closed bdwilliamson closed 3 years ago

bdwilliamson commented 4 years ago

The new version of xgboost (1.1.1.1) uses objective = 'reg:squarederror' in place of objective = 'reg:linear' for regression minimizing mean squared error. Should be an easy fix in SL.xgboost (I'm happy to submit a PR if you don't have bandwidth), and has downstream repercussions for CRAN checks. Reprex below:

## generate the data
set.seed(4747)
p <- 2
n <- 10000
x <- data.frame(replicate(p, stats::runif(n, -5, 5)))
## apply the function to the x's
y <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 + rnorm(n, 0, 1)
sl_mod <- SuperLearner::SuperLearner(Y = y, X = x, cvControl = list(V = 2), SL.library = "SL.xgboost")
ecpolley commented 4 years ago

Thanks for the note, looks like reg:linear still works in 1.1.1.1 but generates a warning message that it is deprecated. I can add a patch, likely an additional if statement to check the xgboost version for backward compatibility for now.

SL.xgboost_new <- function (Y, X, newX, family, obsWeights, id, ntrees = 1000, 
    max_depth = 4, shrinkage = 0.1, minobspernode = 10, params = list(), 
    nthread = 1, verbose = 0, save_period = NULL, ...) 
{
    if (!is.matrix(X)) {
        X = model.matrix(~. - 1, X)
    }
    xgmat = xgboost::xgb.DMatrix(data = X, label = Y, weight = obsWeights)
    if (family$family == "gaussian") {
        model = xgboost::xgboost(data = xgmat, objective = "reg:squarederror", ## if xgboost version >=1.1.1.1, changed from reg:linear to reg:squarederror
            nrounds = ntrees, max_depth = max_depth, min_child_weight = minobspernode, 
            eta = shrinkage, verbose = verbose, nthread = nthread, 
            params = params, save_period = save_period)
    }
    if (family$family == "binomial") {
        model = xgboost::xgboost(data = xgmat, objective = "binary:logistic", 
            nrounds = ntrees, max_depth = max_depth, min_child_weight = minobspernode, 
            eta = shrinkage, verbose = verbose, nthread = nthread, 
            params = params, save_period = save_period)
    }
    if (family$family == "multinomial") {
        model = xgboost::xgboost(data = xgmat, objective = "multi:softmax", 
            nrounds = ntrees, max_depth = max_depth, min_child_weight = minobspernode, 
            eta = shrinkage, verbose = verbose, num_class = length(unique(Y)), 
            nthread = nthread, params = params, save_period = save_period)
    }
    if (!is.matrix(newX)) {
        newX = model.matrix(~. - 1, newX)
    }
    pred = predict(model, newdata = newX)
    fit = list(object = model)
    class(fit) = c("SL.xgboost")
    out = list(pred = pred, fit = fit)
    return(out)
}

## generate the data
set.seed(4747)
p <- 2
n <- 10000
x <- data.frame(replicate(p, stats::runif(n, -5, 5)))
## apply the function to the x's
y <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 + rnorm(n, 0, 1)
sl_mod <- SuperLearner::SuperLearner(Y = y, X = x, cvControl = list(V = 2), SL.library = c("SL.xgboost", "SL.xgboost_new"))
sl_mod # same results
ecpolley commented 3 years ago

This has been fixed in the current version.