Closed Yimsun97 closed 6 months ago
Hi, Did you try using LinearForestRegressor inside a pipeline with at the top a StandardScaler like:
from sklearn.pipeline import make_pipeline
model = make_pipeline(StandardScaler(), LinearForestRegressor(...))
Hi, Did you try using LinearForestRegressor inside a pipeline with at the top a StandardScaler like:
from sklearn.pipeline import make_pipeline model = make_pipeline(StandardScaler(), LinearForestRegressor(...))
Thank you for your reply! I have tried the pipeline and it worked!
I've seen the lase case on the repository homepage that linear forest can be used to resolve the extrapolation issues of random forest. After I used the pipeline, I found that the R-squared of linear forest on the test set (~0.65) is lower than random forest (0.70). Is this commonly seen in the regression problems? How can I improve the fitness of linear forest or does it mean that there is a trade-off between the fitness and the extropolation ability?
Thank you!
Finding the trade-off between the predictive and the extrapolation ability is one of the hardest tasks in the ML ecosystem. Some models are good for maximizing accuracy, others to extract explicative insights. There is no silver bullet for this kind of problem. You should make the proper choices according to your data and needs. All the best
Hi There!
I am very interesting in the linear-tree packge and I found it inspiring for my research. But when I was using LinearForestRegressor in my study, I found that the base estimator of it gave biased coefficients (with too small absolute values) so that the prediction was basically fitted by the forest estimator. Therefore the structure of liear forest will be very similar to a random forest regressor. I found that it may be due to the round off error in the source code function
self._validate_data
where the dtype "float32" was used.I generated a synthetic dataset to compare the LinearRegression model in the scikit-learn and the LinearForestRegressor. BTW, how can we deal with the data with features at multiple orders of magnitudes? Will the parameter
base_estimator
support sklearn pipeline to support preprocessing likeStandardScaler
in the future release?Thank you for your excellent works!