civisanalytics / python-glmnet

A python port of the glmnet package for fitting generalized linear models via penalized maximum likelihood.
Other
262 stars 59 forks source link

Exposing the lower and upper limits V.2 #41

Closed eric-valente closed 6 years ago

eric-valente commented 6 years ago

Trying to get this PR merged. I forked and fixed the remaining issues.

https://github.com/civisanalytics/python-glmnet/pull/19

"either input an array or a single value of lower and upper limits check that lower_limits <= 0 and upper_limits >= 0"

eric-valente commented 6 years ago

Think I covered all your points except this: There should also be a ValueError raised if there's an array input with the wrong number of elements.

Can you give me a little guidance? Assuming if the array > number of features throw ValueError?

stephen-hoover commented 6 years ago

The limits must be either a scalar, or else a list or array with length equal to the number of features. In the if not np.isscalar(self.lower_limits) blocks, you can put a check like if len(...) != X.shape[1]: raise ValueError. You should move the X, y = check_X_y(X, y, accept_sparse='csr', ensure_min_samples=2) line to the top of the function so that you can be sure that X is an array at that point.

eric-valente commented 6 years ago

Okay - made the request changed, will check if it passes ci.

eric-valente commented 6 years ago

AssertionError on the logistic tests, not linear. Thoughts?

Traceback (most recent call last): File "/src/python-glmnet/glmnet/tests/test_logistic.py", line 101, in test_coeflimits assert(np.all(m.coef >= 0)) AssertionError

stephen-hoover commented 6 years ago

I'm bothered by the fact that the linear tests don't fail. In each test, you've set lower limits of -1 and upper limits of 0. The tests are testing that the coefficients are between 0 and 1. The tests should be changed to check that coefficients are between -1 and 0. If the linear tests are passing, then it must be that all coefficients are 0.

Try setting alpha = 0 in both the ElasticNet and LogitNet constructors in the new tests. It defaults to 1, which is lasso. That will push coefficients to equal 0. If you make that change in the linear tests before switching the limits, then that test should start failing.

eric-valente commented 6 years ago

No luck w/ alpha=0

====================================================================== FAIL: test_coef_limits (test_linear.TestElasticNet)

Traceback (most recent call last): File "/src/python-glmnet/glmnet/tests/test_linear.py", line 111, in test_coeflimits assert(np.all(m.coef >= 0)) AssertionError

====================================================================== FAIL: test_coef_limits (test_logistic.TestLogitNet)

Traceback (most recent call last): File "/src/python-glmnet/glmnet/tests/test_logistic.py", line 101, in test_coeflimits assert(np.all(m.coef >= 0)) AssertionError

eric-valente commented 6 years ago

I think the code is okay but we are setting lower_limits to -1, it makes sense that this test fails. We set lowerlimits to -1 so it will trigger this: assert(np.all(m.coef >= 0))

Would this make more sense?

       assert(np.all(m.coef_ >= -1))
        assert(np.all(m.coef_ <= 0))     
stephen-hoover commented 6 years ago

Yes, I'm happy to see that both tests are now failing when you set alpha=0. That's good. If you switch the asserts like you suggest, then the tests should pass.

eric-valente commented 6 years ago

Changed the bounds of the test, but does the same thing. It passed. Anything else?

        x, y = self.inputs[0]
        lower_limits = 0
        upper_limits = np.repeat(1, x.shape[1])
        m = ElasticNet(lower_limits=lower_limits, upper_limits=upper_limits, random_state=5934, alpha=0)
        m = m.fit(x, y)
        assert(np.all(m.coef_ >= 0))
        assert(np.all(m.coef_ <= 1))     
stephen-hoover commented 6 years ago

I'd previously requested that you change the limits to -1 to 0 because the original tests, with limits from 0 to 1, failed to expose a bug in the bounds checking. I'd prefer that these tests use a lower limit of -1 and an upper limit of 0.

eric-valente commented 6 years ago

Okay put it back and changed this:

        assert(np.all(m.coef_ >= -1))
        assert(np.all(m.coef_ <= 0))    
eric-valente commented 6 years ago

Okay looks good! Let me know.