I tried to switch from gbm to gbm3 for quantile regression, but I saw some wrong quantile prediction results from gbm3. See the example below, using the attached toy data X.zip
library(data.table)
get_tree <- function(g) {
t <- if (class(g) == "gbm") gbm::pretty.gbm.tree(g)
else gbm3::pretty_gbm_tree(g)
t$Node <- as.integer(rownames(t))
mis <- t$MissingNode + 1
t <- t[-mis,]
t$MissingNode <- NULL
t$RealPrediction <- t$Prediction + g$initF
t
}
X <- readRDS('X.rds')
str(X)
params <- list(y ~ ., data=X, distribution = list(name="quantile", alpha=0.9), n.trees = 1,
interaction.depth = 2, n.minobsinnode = 250, shrinkage = 1, bag.fraction = 1)
g0 <- do.call(gbm::gbm, params)
g3 <- do.call(gbm3::gbm, params)
get_tree(g0)
get_tree(g3)
# overall 90% quantile:
quantile(X$y, 0.9, type = 2)
# true 90% quantiles inside the splits:
X[, quantile(y, 0.9, type = 2), .(s1 = a<14.5, s2 = a>=14.5 & b < 0.133)]
# Check the empirical CDF's in the 4th node for g0 and g3:
X[a>=14.5 & b >= 0.133, ecdf(y)(c(1.716003, 1.529294))]
While the true 90% quantiles inside the splits match the leaves from g0 spot on, the node 4 leaf in g3 is wrong, and it corresponds to a 57% empirical quantile.
I tried to switch from gbm to gbm3 for quantile regression, but I saw some wrong quantile prediction results from gbm3. See the example below, using the attached toy data X.zip
The output of it is
While the true 90% quantiles inside the splits match the leaves from g0 spot on, the node 4 leaf in g3 is wrong, and it corresponds to a 57% empirical quantile.