abess-team / abess

Fast Best-Subset Selection Library
https://abess.readthedocs.io/
Other
474 stars 41 forks source link

Incorporation of `fit_intercept` to `LinearRegression`? #507

Closed MattWenham closed 1 year ago

MattWenham commented 1 year ago

Would it be possible to introduce a fit_intercept to LinearRegression, similar to the sklearn linear regression implementation?

Mamba413 commented 1 year ago

Thank you for your question. I’m not sure why sklearn.linear_model. LinearRegression includes a fit_intercept parameter. For instance, if someone sets fit_intercept=False but forgets to center the response, the parameter estimation will be inaccurate:

import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(X, np.array([1, 2])) + 3
reg = LinearRegression().fit(X, y)
reg.coef_
>>> array([1., 2.])
reg = LinearRegression(fit_intercept=False).fit(X, y)
reg.coef_
>>> array([2.09090909, 2.54545455])

In my personal opinion, the fit_intercept parameter is unnecessary. Perhaps I’m missing something. Could you provide some reasoning for including this parameter?

MattWenham commented 1 year ago

An obvious example is the curve-fitting of basis functions to observations.

image Here, we fit a set of observations S to two sets of basis functions: the μ are the foreground basis functions from which we need to choose, hence using abess. The B are the background basis functions, which are always_selected. There is no intercept term in this least-squares fit.

Mamba413 commented 1 year ago

Thanks. I also notice that generalized linear models, like LogisticRegression (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) and PoissonRegressor (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PoissonRegressor.html), also supportfit_intercept. It seems this parameter is practically helpful, and we will implement this soon.

MattWenham commented 1 year ago

Thank you for your responsiveness!

Mamba413 commented 1 year ago

@MattWenham , hi, we have just implemented this feature. Beyond, fit_intercept is also supported by LogisticRegression, PoissonRegression and other generalized linear modes. You can install the latest version of abess (see https://abess.readthedocs.io/en/latest/Installation.html#python-1) to use this new feature.

MattWenham commented 1 year ago

We can't currently get this to build from scratch due to the following error:

abess/python/src/pywrap.cpp:1:10: fatal error: pybind11/eigen.h: No such file or directory 1 | #include <pybind11/eigen.h>

We have tried two different versions of pybind11, but get the same error.

Mamba413 commented 1 year ago

According to the results of pip list, the version of pybind11 is:

pybind11                      2.9.1
pybind11-global               2.9.1

Maybe the issue appears because pybind11-global hasn't been installed? You can install this via: pip install "pybind11[global]".

Wish that helps!

MattWenham commented 1 year ago

We have downgraded to pybind11 2.9.1:

pybind11                  2.9.1                    pypi_0    pypi
pybind11-global           2.9.1                    pypi_0    pypi

...and continue to get the same error. We are attempting to install under WSL / Ubuntu 22.04.1 LTS.

oooo26 commented 1 year ago

Hi @MattWenham , we have tried pybind11 2.9.1 & 2.10.4(latest release), and both of them work to build abess.

Could you check if there is multiple python environment in the device and then pybind11 may be installed to another environment? Or try a new conda environment(python version >= 3.6)?

If the error still exists, please feel free to paste more error logs here so that we can help to solve it.