CDonnerer / xgboost-distribution

Probabilistic prediction with XGBoost.
MIT License
100 stars 17 forks source link

"PicklingError" while dumping model with joblib #82

Closed CyperStone closed 1 year ago

CyperStone commented 1 year ago

I got this error: PicklingError: Can't pickle <class 'xgboost_distribution.distributions.base.Predictions'>: it's not the same object as xgboost_distribution.distributions.base.Predictions while trying to dump an instance of a class which has two different XGBDistribution models inside. Minimal, reproducible example:

import joblib
from sklearn.datasets import make_regression
from xgboost_distribution import XGBDistribution

class XGBDistributionWrapper:

    def __init__(self, params_1, params_2):
        self.predictor_1 = XGBDistribution(**params_1)
        self.predictor_2 = XGBDistribution(**params_2)

    def fit(self, X, y1, y2):
        self.predictor_1.fit(X, y1)
        self.predictor_2.fit(X, y2)

        return self

    def predict(self, X):
        preds_1 = self.predictor_1.predict(X)
        preds_2 = self.predictor_2.predict(X)

        return preds_1.loc, preds_2.loc

X, Y = make_regression(n_samples=1000, n_features=50, n_targets=2, random_state=42)
y1 = Y[:, 0]
y2 = Y[:, 1]

params = {'distribution': 'normal'}
xgb = XGBDistributionWrapper(params, params)
xgb.fit(X, y1, y2)

joblib.dump(xgb, 'xgboost.jpkl')

As a workaround I changed returning the predicted values by the distribution to a normal tuple instead of namedtuple and then dumping and loading of a fitted model worked fine.