dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.24k stars 8.72k forks source link

Implement bart. #4398

Closed trivialfis closed 5 years ago

trivialfis commented 5 years ago

Hi all,

Introduction

Recently I have been reading Bayesian Additive Regression Tree and started to implement it on top of XGBoost. It's a tree ensemble model with MCMC(modified gibbs) sampling for structure learning. Recent years some variances have been proposed, mostly tuning the sampling method to get higher performance or lower computation cost. Currently I'm still trying to get a grasp for the original version.

Although XGBoost itself is a gradient boosting framework, but exception like random forest is also implemented in the library. I would like to know the interest in this subject. If the algorithm is accepted as part of XGBoost, I will try to implement it as a plugin for initial support.

Existing implementation

The algorithm has been implemented many times as R package:

Also one WIP Python project:

I don't quite understand the R packages' source code so am only familiar with bartpy.

My plan

Even though currently I'm still at "trying to do it", but already have over 1000 lines of code added so it's not trivial. If the model is accepted I will first submit a naive implementation that can be reviewed.

@tqchen @hcho3 @RAMitchell @CodingCat

hcho3 commented 5 years ago

@trivialfis This is interesting. I will need to give myself some time to review the paper.

I think you should do either 1) write a RFC to explain your (software) design, in the form similar to dmlc/tvm#2889 OR 2) write a draft pull request that can be easily reviewed.