CDonnerer / xgboost-distribution

Probabilistic prediction with XGBoost.
MIT License
100 stars 17 forks source link

Parameters go unused when passed through Sklearn Pipeline #101

Closed ae-powell closed 5 months ago

ae-powell commented 5 months ago

I've found that parameters can go unused when XGBDistribution if used in a sklearn Pipeline. For example, this snippet below returns a warning that sample_weights are going unused. Hopefully there is a relatively simple explanation or fix for this.

from xgboost_distribution import XGBDistribution
from scipy.stats import nbinom
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# ... split dataset into X_train, X_test, y_train, y_test, and sample_wgt

model = XGBDistribution(
    distribution="negative-binomial"
)

pipe = Pipeline(
  [
    ('scaler', StandardScaler()),
    ('model', model)
  ]
)

pipe.fit(X_train, y_train, model__sample_weight=sample_wgt)

This returns the following warning message:

/local_disk0/.ephemeral_nfs/envs/pythonEnv-b1e95427-ed1d-47a7-8c9d-a3b62891426a/lib/python3.9/site-packages/xgboost/core.py:160: UserWarning: [13:29:49] WARNING: /workspace/src/learner.cc:742: 
Parameters: { "sample_weight" } are not used.
CDonnerer commented 5 months ago

Hi, I'm not able to reproduce this, could you share and/or upgrade the versions that you've used for xgboost and xgboost-distribution?

Sample weights are fully supported after version >= 0.2.6. Note the syntax is to only set them in the fit call:

model = XGBDistribution()
model.fit(X, y, sample_weight=sample_weight)
ae-powell commented 5 months ago

Thank you. Looks like it was a compatibility issue with scikit-learn. When upgrading to 1.4.2 I didn't experience the same issues with passing parameters after using .set_fit_request(sample_weight=True).