Open houlk8503 opened 1 year ago
library(xgboost)
data <- data.frame(X1=rep(1.4,3),
X2=rep(2.7,3),
X3=rep(3.5,3),
Y=c(100, 300, 2000))
train.input <- xgb.DMatrix(data=as.matrix(data[,1:3]), label=data$Y)
test.data <- as.matrix(data.frame(X1=1.4, X2=2.7, X3=3.5))
num.expr <- 1000
bst <- 1e10 #record the best loss value
best.res <- 0 #record the best fit value
set.seed(2023)
slope <- 1.0
pseudo.huber.loss <- function(x, y, slp) {
d <- x - y
cst <- sqrt(slp * slp + d * d) - slp
return(sum(cst))
}
for (i in c(1:num.expr)) {
randm <- runif(2)
model <- xgb.train(params=list(huber.slope=slope,
lambda=randm[1],
#tree.method='exact',
learning.rate=randm[2],
max.depth=1L),
data=train.input,
eval.metric='mae',
objective="reg:pseudohubererror",
nrounds=20)
res <- predict(model, test.data)
cost <- pseudo.huber.loss(rep(res, 3), data$Y, slope)
if (cost < bst) {
bst <- cost
best.res <- res
}
}
print(res)
print(bst)
Apologies for the slow reply, I will look into it next week.
Hi, I am recently conducting some numerical experiments for a new tree boosting trick. I designed a simple example for robust regression, in which there are three data points involved: (1.2, 1.5, 2.3, 100), (1.2, 1.5, 2.3, 300), and (1.2, 1.5, 2.3, 1000). The heading 3 value are features, and the last value is the regression target. I applied XGBoost with pseudo-huber error to fit this data, with huber-delta set as 1. Since pseudo-huber error is a smoothing variant of absolute-error, so the fitted value is expected to be around 300. However, no matter how hard I tried(e.g. by tuning the regularization weight lambda as well as the learning rate), the fitted value was consistently stuck in a large number, even when there were just a few boosting iterations and large regularization weights were applied. Since the gradient of pseudo-huber error is bounded in absolute value by the huber-delta, this phenomenon was quite unexpected. Does anyone know how to resolve this issue?