CDonnerer / xgboost-distribution

Probabilistic prediction with XGBoost.
MIT License
99 stars 17 forks source link

Pickling Models #57

Closed thomasaarholt closed 2 years ago

thomasaarholt commented 2 years ago

We just went down a rabbit hole where we pickle a trained model, upload the pickle to the cloud, pull it down elsewhere and try to run model.predict(), we get

  File "/usr/local/lib/python3.6/site-packages/xgboost_distribution/model.py", line 211, in predict
    return self._distribution.predict(params)
  File "/usr/local/lib/python3.6/site-packages/xgboost_distribution/distributions/normal.py", line 93, in predict
    return self.Predictions(loc=loc, scale=scale)
TypeError: __new__() got an unexpected keyword argument 'loc'

But if we do it locally (train, pickle, load, predict), it works fine. Turns out pickling namedtuple's are tricky, and that you've probably felt this and labelled so explicitly in the changelog.

Would you mind providing an example of how we should save? 😅 😆

CDonnerer commented 2 years ago

Hi, Could you check which python versions you've used for save/load? If they are not identical, save/load typically won't work well with pickle. The above might have been fixed in the latest release of the package, as described in the changelog.

In general, I'd highly recommend to handle model IO via xgboost's own methods (see official docs here), e.g.:

model = XGBDistribution()
model.fit(X, y)
model.save_model("xgb.json")

saved_model = XGBDistribution()
saved_model.load_model("xgb.json")

which should be much more robust across different environments.

thomasaarholt commented 2 years ago

Thanks for the clarification! We really could not figure it out - We tried pushing the pickle up and pulling it back down to the exact same environment, and still had the same error. We'll use the json approach :)

thomasaarholt commented 2 years ago

model.save_model worked without issue. Thanks!

(We were discussing the pickle error over beers on Friday - noone has a clue what caused it, but I don't think its worth losing sleep over :) )