dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.3k stars 8.73k forks source link

Fitting Linear Functions inside Tree leaves (Feature Request) #5725

Open Fish-Soup opened 4 years ago

Fish-Soup commented 4 years ago

I was wondering if it where possible to develop an new booster, that instead of taking the mean of values inside a leaf instead fitted a linear function. In cases of lower numbers a features its possible that a piece-wise linear model will perform better than a tree based one. Requiring less leaves and trees to model smoothly changing functions. In certain cases this could produce higher accuracy predictions. An additional benefit is that it would allow extrapolation which may be important in certain use cases.

I have found two implementations of this

LinXGBoost: Is written in purely python and describes itself as an extension to XGBoost, However in the paper it mentions it hasn't been written with performance in mind.

https://github.com/ldv1/LinXGBoost https://arxiv.org/pdf/1710.03634.pdf

GBDT-PL: Has a python API, think the back end is in C. This performs very well when compared to other gradient boosted decision trees. (at least on the tests/hyperparameters they chose). The paper details many optimisations to make the code run quickly .

https://github.com/GBDT-PL/GBDT-PL https://arxiv.org/pdf/1802.05640.pdf

An additional optimization I had thought of was you could specify only a subset of the features to fit the linear fit to.

Many thanks EDIT I fixed the broken links

xuyxu commented 4 years ago

Feature-request on supporting multi-output regression was mentioned before (#2087 #3439). It will bring substantial maintenance costs, as essentially what needs is to use another base learner.

To simulate "linear functions in leaf nodes", XGBoost:Regression+MultiOutputRegressor in sklearn works reasonably well, despite many paper claims that this solution ignores correlations between different target variables.

trivialfis commented 4 years ago

It's something I wanted for a long time. Also I have a proof of concept impl in #5460 . I just need to allocate time to focus on it.

Murgio commented 4 years ago

@AaronX121 Do you have any citations for the papers

... many paper claims ...

?

trivialfis commented 4 years ago

Actually correlation doesn't help much in my experiments. Your result is likely to be worse due to model capacity. That's one of the reasons that I'm not rushing the implementation. It's mostly for faster inference time.

xuyxu commented 4 years ago

@Murgio Hi, here is one work that directly tackles the multi-output regression problem for GBDT: https://arxiv.org/pdf/1909.04373.pdf, you may find it helpful :). Its code is available on GitHub. For sparse multi-output, here is another work: http://proceedings.mlr.press/v70/si17a.html.

Also, many variants of CART are equipped with linear models in internal or leaf nodes, such as piece-wise linear tree (https://arxiv.org/pdf/1802.05640.pdf), soft decision tree (https://arxiv.org/pdf/1711.09784.pdf), and many more. They can be easily combined with one gradient boosting wrapper for multi-output regression / multi-label classification.

I also ran experiments on some benchmark datasets (http://mulan.sourceforge.net/datasets-mtr.html). It is hard to say that these methods are superior to XGBoost+MultiOutputRegressor.

Fish-Soup commented 4 years ago

@AaronX121 I have possibly explained my request poorly or misunderstood your response but I don't see how what I asked for is equivalent to XGBoost plus Multioutputregressor. What I was requesting is described in the paper we both linked https://arxiv.org/pdf/1802.05640.pdf on piecwise linear tree which uses piecewise lin- ear regression trees (PL Trees), instead of piece- wise constant regression trees. I will have a look at the soft decision tree paper.

Ps I fixed my links in my initial request

LudvicLaberge commented 3 years ago

I second @Fish-Soup 's last question: how is XGBoost plus Multioutputregressor similar/equivalent to the initial feature request? Are there papers or examples contrasting both?

Went and read the Multioutputregressor docs and doesn't seem like the right solution. I have only one target, but I'd like the learners to be piecewise linear instead of step functions.

carloamodeo commented 7 months ago

Hello all, Is this feature going to be implemented?

mlqmlq commented 1 day ago

Do we have any updates on this?