dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.12k stars 8.7k forks source link

Convergence issue under pseudo-huber loss for simple experiment data #9378

Open houlk8503 opened 1 year ago

houlk8503 commented 1 year ago

Hi, I am recently conducting some numerical experiments for a new tree boosting trick. I designed a simple example for robust regression, in which there are three data points involved: (1.2, 1.5, 2.3, 100), (1.2, 1.5, 2.3, 300), and (1.2, 1.5, 2.3, 1000). The heading 3 value are features, and the last value is the regression target. I applied XGBoost with pseudo-huber error to fit this data, with huber-delta set as 1. Since pseudo-huber error is a smoothing variant of absolute-error, so the fitted value is expected to be around 300. However, no matter how hard I tried(e.g. by tuning the regularization weight lambda as well as the learning rate), the fitted value was consistently stuck in a large number, even when there were just a few boosting iterations and large regularization weights were applied. Since the gradient of pseudo-huber error is bounded in absolute value by the huber-delta, this phenomenon was quite unexpected. Does anyone know how to resolve this issue?

houlk8503 commented 1 year ago

For your information, this is the corresponding R script

library(xgboost)
data <- data.frame(X1=rep(1.4,3),
                   X2=rep(2.7,3),
                   X3=rep(3.5,3),
                   Y=c(100, 300, 2000))
train.input <- xgb.DMatrix(data=as.matrix(data[,1:3]), label=data$Y)
test.data <- as.matrix(data.frame(X1=1.4, X2=2.7, X3=3.5))
num.expr <- 1000
bst <- 1e10 #record the best loss value
best.res <- 0 #record the best fit value
set.seed(2023)
slope <- 1.0
pseudo.huber.loss <- function(x, y, slp) {
  d <- x - y
  cst <- sqrt(slp * slp + d * d) - slp
  return(sum(cst))
}
for (i in c(1:num.expr)) {
  randm <- runif(2)
  model <- xgb.train(params=list(huber.slope=slope,
                                 lambda=randm[1],
                                 #tree.method='exact',
                                 learning.rate=randm[2],
                                 max.depth=1L),
                     data=train.input,
                     eval.metric='mae',
                     objective="reg:pseudohubererror",
                     nrounds=20)
   res <- predict(model, test.data)
   cost <- pseudo.huber.loss(rep(res, 3), data$Y, slope)
   if (cost < bst) {
     bst <- cost
     best.res <- res
   }
}
print(res)
print(bst)
trivialfis commented 1 year ago

Apologies for the slow reply, I will look into it next week.