catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
https://catboost.ai
Apache License 2.0
7.87k stars 1.16k forks source link

What label= should be for the regression model? #1623

Closed Roffild closed 3 months ago

Roffild commented 3 years ago
mylbl = numpy.array([[0.], [0.], [1.], [0.], [1.]], dtype=numpy.float32)
mylbl.ndim == 2

xgboost.DMatrix(label=)

catboost.Pool(label=)

lightgbm.Dataset(label=)

Can you accept a single standard?

Parameter description is incorrect.

Roffild commented 3 years ago

https://github.com/dmlc/xgboost/issues/6786 https://github.com/catboost/catboost/issues/1623 https://github.com/microsoft/LightGBM/issues/4115

Roffild commented 3 years ago

Used training.

Roffild commented 3 years ago

I am iterating over the parameters. It's easier for me to set label. Algorithm-level error is preferred over global error.

In the description 1D array. But for regression there must be ND.

andrey-khropov commented 3 months ago

Parameter description is incorrect.

What is incorrect about it (for CatBoost)?

label : list or numpy.ndarrays or pandas.DataFrame or pandas.Series, optional (default=None) Label of the training data. If not None, giving 1 or 2 dimensional array like data with floats.

Single and two-dimensional arrays are accepted as specified in the description.

label=None File "catboost\core.py", line 976, in _build_train_pool raise CatBoostError("Label in X has not been initialized.")

This error only happens if you try to use the dataset in Pool for training where label data is necessary, if you use it for training with pairs (where label data is optional) or for prediction label=None is perfectly valid and no error occurs.