Open lorentzenchr opened 1 year ago
Thanks a lot! For future reference, these are the failing tests:
I can get the 12 failing L-BFGS related tests to pass by not standardizing the design matrix here.
64 failing tests to go.
All the failing tests seem to be for unpenalized regression with a singular design matrix (either the wide problem: p=12, n=4, or the stacked problem where we duplicate all columns). Is that correct? Maybe this is a dumb question but what is the expected result in this case? I'm not surprised to see the tests failing in this case for glum, but in case we want to support this the tests are great!
It is often said that singular design matrices don't allow for a solution, but this is wrong, there are just infinitely many solutions. For OLS, there is a particular nice one called minimal norm solution, i.e. the solution/coefficients having minimal L2 norm among all solutions/coefficients. It may by that this is of no high practical value, but in light of the discovered interpolation regime, it is at least interesting.
I have at least one PR for the line search in mind that could help at least with a few of those test failures.
Scikit-learn has some very strict tests for GLMs in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/linear_model/_glm/tests/test_glm.py. I modified the file to test
glum.GeneralizedLinearRegressor
instead, see https://gist.github.com/lorentzenchr/2e319bcfd4aadfbea64c6330e5b33521. Runningpytest test_glm.py
results in 76 failed, 212 passed, 104 warnings.It might be interesting to include those tests in glum.