Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
I've been trying to use Dask to chunk data and train a regressor, but I kept receiving this error when trying out some sample code:
xgboost.core.XGBoostError: [15:26:58] /Users/runner/work/xgboost/xgboost/src/objective/regression_obj.cu:528: Check failed: info.labels.Size() != 0U (0 vs. 0) : label set cannot be empty
After some experimentation I realised that when I changed the chunks in my dask to a smaller size that the full array length, the error would go away.
I've included a snippet which should help recreate the issue:
import distributed
from xgboost import dask as dxgb
import dask.array as da
if __name__ == "__main__":
client = distributed.Client()
samples = 1000
cols = 10
X_train = da.random.random(size=(samples, cols), chunks=1000) # set to <samples if you want the training to succeed
y_train = X_train.sum(axis=1)
X_val = da.random.random(size=(samples, cols), chunks=1000)
y_val = X_val.sum(axis=1)
model = dxgb.DaskXGBRegressor(
learning_rate=0.1,
max_depth=3,
early_stopping_rounds=30,
objective='reg:tweedie',
)
model.fit(
X_train,
y_train,
eval_set=[(X_val, y_val)],
)
print(model.feature_importances_)
There are a few ways I can get this to succeed e.g. changing my objective to reg:squarederrr which seems happy when chunk size = data size. I don't understand my the tweedie objective doesn't work though.
I've been trying to use Dask to chunk data and train a regressor, but I kept receiving this error when trying out some sample code:
After some experimentation I realised that when I changed the
chunks
in my dask to a smaller size that the full array length, the error would go away.I've included a snippet which should help recreate the issue:
There are a few ways I can get this to succeed e.g. changing my objective to
reg:squarederrr
which seems happy when chunk size = data size. I don't understand my thetweedie
objective doesn't work though.Environment details: OS: MacOS 14.5 Chip: M1 Max Python Version: 3.11 Packages: