catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
https://catboost.ai
Apache License 2.0
8.12k stars 1.19k forks source link

(Linear) Model Tree in catboost #1340

Open ludovicgauvin opened 4 years ago

ludovicgauvin commented 4 years ago

Hi,

I am a new user of Catboost and I was wondering if it is possible to implement model tree in catboost (or in gradient boosting regression trees in general). My feeling is that by using linear regression in the nodes one would need less iteration (less trees) in order to achieve similar prediction performance.

Best,

P.S.: Example comparing LMT and GBT https://medium.com/convoy-tech/the-best-of-both-worlds-linear-model-trees-7c9ce139767d

kizill commented 4 years ago

Hello! Thank you for a good question :) We definitely have plans on experimenting with more complex tree models, so we'll return here when we have news or some practical ideas about possible contributions from our users.

david-waterworth commented 4 years ago

This would be useful, there are some papers on model predictive control that use LMT (they had a different term) since for optimization you need to take derivatives of the target wrt to the inputs.

ludovicgauvin commented 4 years ago

Many thanks for your answers. I do not know excatly how Catboost works but I guess that in the leaves a simple mean is estiamted. The user could define another function (at first a simple regression on some features) and this function would be estimated in place of the mean. RMSE, MAE or any loss function should be calculted based on the simple regression of course.

Of course, unlike the mean, there is minimum number of observations needed in order to estimate a simple regression.

david-waterworth commented 4 years ago

At the moment it doesn't actually directly calculate the mean of the leaf values. Instead, it's approximated using the first and second derivative of the loss at that leaf, i.e. $w_k = -g_k / (h_k - \lambda)$ where w_k is the prediction which minimises the loss for leaf k (i.e. the mean if the loss is squared error), g_k and h_k are the avg of the gradients and hessians at leaf k and lambda is the L2 normalization.

I agree being able to replace this weight function with another would be nice. There is already a parameter to ensure that each leaf has a minimum number of observations.

ludovicgauvin commented 4 years ago

@david-waterworth thanks again. Do you have a link to the papers you mentioned in you in your previous answer?

david-waterworth commented 4 years ago

https://www.researchgate.net/publication/319667922_Data_Predictive_Control_using_Regression_Trees_and_Ensemble_Learning

https://www.sciencedirect.com/science/article/abs/pii/S030626191630246X

Also anything regarding MARS is probably relevant as it fits hinge functions (relu's) in the leaf nodes as far as I understand.

rivershah commented 4 years ago

Any further thoughts or progress on this please?

Evgueni-Petrov-aka-espetrov commented 4 years ago

Support for linear model trees is in the current CatBoost development plan.

rivershah commented 4 years ago

Any insight on timeline please? Thanks again for looking at this feature!

Evgueni-Petrov-aka-espetrov commented 4 years ago

The work is in progress and will likely take more than one month, or even more than that.

rivershah commented 3 years ago

Hi guys, hope you are having a great new year. Any update on this issue please? Thanks

AleksandrovichK commented 3 years ago

Any update, folks?

Evgueni-Petrov-aka-espetrov commented 2 years ago

This is paused for now because our experiments showed quality improvements only on synthetic datasets.