dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.02k stars 8.69k forks source link

Features request - Conformal Prediction #9996

Open valeman opened 7 months ago

valeman commented 7 months ago

Problem: XGBoost is a great library, but it currently lacks reliable modern uncertainty quantification that is rather easy to implement using conformal prediction. https://github.com/valeman/awesome-conformal-prediction

Feature request - add conformal prediction for regression and classification similar to what other libraries like MAPIE implemented.

Regression - Inductive (split) and Conformalized Quantile Regression https://mapie.readthedocs.io/en/stable/examples_regression/4-tutorials/plot_cqr_tutorial.html

Classification - Venn Abers

https://proceedings.neurips.cc/paper/2015/file/a9a1d5317a33ae8cef33961c34144f84-Paper.pdf

There is also talk by Vovk https://m.youtube.com/watch?v=ksrUJdb2tA8&pp=ygUNVm92ayB2bGFkaW1pcg%3D%3D

Tutorial https://cml.rhul.ac.uk/copa2017/presentations/VennTutorialCOPA2017.pdf

https://github.com/ip200/venn-abers implementation by Ivan Petej

Older implementation by Paolo Tocacceli

https://github.com/ptocca/VennABERS

Venn ABERS demo https://github.com/ptocca/VennABERS-demo

trivialfis commented 7 months ago

Thank you for raising the issue, yes, I have been looking into it and can try to build one on top of quantile regression. Another direction is based on distribution parameters prediction, which is also possible.

trivialfis commented 7 months ago

The question is that, since it's a post-hoc method, do we need to build it inside XGBoost? Or is it better to build an independent library that works across all types of models including the ones from XGBoost?

valeman commented 7 months ago

@trivialfis there is big demand for having this inside XGBoost.

hcho3 commented 7 months ago

@valeman

there is big demand for having this inside XGBoost

What would be the main benefit of having the feature inside XGBoost? Do you see any of the following benefits by having conformal prediction built inside XGBoost (compared to the alternative of implementing as an external library)?

valeman commented 7 months ago

Hi @hcho3,

1) XGBoost would benefit from having reliable uncertainty quantification framework that provides guarantees of calibration.

2) conformal prediction is easy to implement.

3) Having conformal prediction inside XGBoost will make it easy for users to produce models, including in production pipelines.

4) faster performance.

Happy to discuss this in more detail. https://www.linkedin.com/in/valeriy-manokhin-phd-mba-cqf-704731236/

wavescholar commented 6 months ago

We implemented a version of this ourselves at RedHat. It would be a good addition to the library.

trivialfis commented 6 months ago

Considering that we need a calibrater for conformal prediction, once we start working on distributed version and GPU version, this can be nontrivial, I think it's best to host it in a different project. Also, I saw the same feature request being opened on other projects as well, implementing it independent of XGB can be more useful than having it exclusive to XGB then do it over again for other projects.

trivialfis commented 6 months ago

cc @jameslamb for awareness.