Open kmedved opened 3 years ago
Hi, Thanks for raising this. My initial thinking here was that this gives the impression that the estimator is just like a normal regressor giving point estimates, which I wanted to avoid. However, I do see the appeal of being able to fit into the scikit-learn ecosystem. The "correct" way of tuning hyperparameters should probably use the negative log likelihood, but maybe there's an argument for being able to use something like RMSE. I'll have a look into this!
In the meantime, you could get to the above API by doing something like:
class XGBDistributionMean(XGBDistribution):
def predict(self, *args, **kwargs):
preds = super().predict(*args, **kwargs)
return preds.loc
def predict_distribution(self, *args, **kwargs):
return super().predict(*args, **kwargs)
which should be fully compatible with scikit-learn.
Hello - thanks again for the wonderful package.
I wanted to ask whether it would make sense to adjust the current
.predict
API to mimic NGBoost's performance, of returning point predictions, while relying on.pred_dist()
to return information about the distribution.The advantage of this is mostly to increase compatibility with the rest of the scikit-learn ecosystem for the purposes of hyperparameter tuning and other testing. Right now, it's difficult to integrate xgboost-distribution with those tools because the .predict() call returns both the point predictions and distribution information at the same time (with a normal distribution).
This seems like a simple change, but I wanted to get your thoughts. Thanks.