Open ludovicgauvin opened 4 years ago
Hello! Thank you for a good question :) We definitely have plans on experimenting with more complex tree models, so we'll return here when we have news or some practical ideas about possible contributions from our users.
This would be useful, there are some papers on model predictive control that use LMT (they had a different term) since for optimization you need to take derivatives of the target wrt to the inputs.
Many thanks for your answers. I do not know excatly how Catboost works but I guess that in the leaves a simple mean is estiamted. The user could define another function (at first a simple regression on some features) and this function would be estimated in place of the mean. RMSE, MAE or any loss function should be calculted based on the simple regression of course.
Of course, unlike the mean, there is minimum number of observations needed in order to estimate a simple regression.
At the moment it doesn't actually directly calculate the mean of the leaf values. Instead, it's approximated using the first and second derivative of the loss at that leaf, i.e. $w_k = -g_k / (h_k - \lambda)$ where w_k is the prediction which minimises the loss for leaf k (i.e. the mean if the loss is squared error), g_k and h_k are the avg of the gradients and hessians at leaf k and lambda is the L2 normalization.
I agree being able to replace this weight function with another would be nice. There is already a parameter to ensure that each leaf has a minimum number of observations.
@david-waterworth thanks again. Do you have a link to the papers you mentioned in you in your previous answer?
https://www.sciencedirect.com/science/article/abs/pii/S030626191630246X
Also anything regarding MARS is probably relevant as it fits hinge functions (relu's) in the leaf nodes as far as I understand.
Any further thoughts or progress on this please?
Support for linear model trees is in the current CatBoost development plan.
Any insight on timeline please? Thanks again for looking at this feature!
The work is in progress and will likely take more than one month, or even more than that.
Hi guys, hope you are having a great new year. Any update on this issue please? Thanks
Any update, folks?
This is paused for now because our experiments showed quality improvements only on synthetic datasets.
Hi,
I am a new user of Catboost and I was wondering if it is possible to implement model tree in catboost (or in gradient boosting regression trees in general). My feeling is that by using linear regression in the nodes one would need less iteration (less trees) in order to achieve similar prediction performance.
Best,
P.S.: Example comparing LMT and GBT https://medium.com/convoy-tech/the-best-of-both-worlds-linear-model-trees-7c9ce139767d