Closed jckkvs closed 2 years ago
Hi,
Generally (for both linear and standard decision trees), setting a specific number for max_depth (let's assume 10) doesn't force the model to go to depth 10! If there is no utility in splitting the algorithm stop before arriving at step 10.
Making a practical example... if we have to predict a perfect line with a LinearTreeRegressor
and max_depth=20
:
>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> from lineartree import LinearTreeRegressor
>>> X = np.arange(100).reshape(-1,1)
>>> y = np.arange(100)
>>> lt = LinearTreeRegressor(LinearRegression(), max_depth=20).fit(X,y)
... {0: {'loss': 0.0, 'models': LinearRegression(), 'samples': 100}}
Only one LinearRegression
is fitted
If you support the project don't forget to leave a star ;-)
Thanks.I misunderstood the max_depth specification because I verified max_depth of LTR based on virtual data with inflection points.
Hi.
I tried the code you suggested. But the result was different.
import numpy as np
from sklearn.linear_model import LinearRegression
from lineartree import LinearTreeRegressor
X = np.arange(100).reshape(-1,1)
y = np.arange(100)
lt = LinearTreeRegressor(LinearRegression(), max_depth=20).fit(X,y)
lt.summary()
{0: {'col': 0, 'th': 37.5, 'loss': 0.0,'samples': 100, 'children': (1, 2),'models': (LinearRegression(), LinearRegression())},
1: {'loss': 0.0, 'samples': 38, 'models': LinearRegression()},
2: {'loss': 0.0, 'samples': 62, 'models': LinearRegression()}}
python 3.7.11 sklearn 0.24.2 numpy 1.20.3 lineatree 0.3.3
I've tried a few other examples, If the number of samples is 11 or less, only one LinearRegression
is fitted.
Here is the running notebook for reproducibility.
EDIT: This is may also due to the numeric precision of your environment... where a loss of (for example) 5.429976129669105e-29 is not equal to 0.0 so the tree continues to grow. This is automatically limited (setting a fixed rounding precision) in lineartree>=0.3.4
Thanks for the good library.
When using LinearTreeRegressor, I think that max_depth is often optimized by cross-validation.
This library allows max_depth in the range 1-20. However, depending on the dataset, simple linear regression may be suitable. Even in such a dataset, max_depth is forced to be 1 or more, so Simple Linear Regression cannot be applied properly with LinearTreeRegressor.
My suggestion is to change to a program that uses base_estimator to perform regression when "max_depth = 0". With this change, LinearTreeRegressor can flexibly respond to both segmented regression and simple regression by changing hyperparameters.