Closed titipata closed 4 years ago
how are you testing? can you add the check_estimator
line somewhere in the diff?
@jasmainak I did add a test script where you can check by running py.test --cov=pyglmnet tests/
. Note that this will fail a bunch of original tests since I commented out fit_predict
from GLM
Cool, FYI, I use this command:
$ pytest ./tests/test_pyglmnet.py::test_glm_estimator --pdb
that way you drop right into the console when there is an error :)
For whatever reason X
is an array of complex numbers, that's why it is failing
@jasmainak Thanks for the handy test script. Oh yeah, I see that: z
is ndarray
of size (10, ) with a complex number in there...
I don't think it's easy to make all the checks happy without adding an sklearn dependency. Basically you have to follow what is in: https://github.com/scikit-learn-contrib/project-template/blob/master/skltemplate/_template.py. I'd like to believe it can be done though :)
@jasmainak alright, I tried adding sklearn dependency so that it follows their template style. Not sure if we also want BaseEstimator
to be from sklearn also?
Let's try to first make it work. Use sklearn or whatever is necessary. Then we can make it nice and see if the dependency can be dropped easily :)
@jasmainak Now the error is in predict_proba
, I'm not sure how to fix it.
@jasmainak alright the current commits should work with check_estimator
now! The part that I cannot keep track is the _allow_refit
. Does scikit-learn dependencies keep track of this fitting/re-fitting?
You can remove the refit logic for now. Do all tests pass if you do that? I’ll see what to do with the refit when I get a chance in a couple of hours
On Tue 18 Feb 2020 at 14:05, Titipat Achakulvisut notifications@github.com wrote:
@jasmainak https://github.com/jasmainak alright the current commits should work with check_estimator now!
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/glm-tools/pyglmnet/pull/364?email_source=notifications&email_token=ADY6FIX2I4DDDKREO2XCPL3RDQWODA5CNFSM4KXIRITKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMDSOHQ#issuecomment-587671326, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY6FIRQK7MBYVQKRN4QW3TRDQWODANCNFSM4KXIRITA .
-- Sent from my iPhone
@jasmainak There are 2 failed (test_random_state_consistency
and test_api_input
), 56 passed, 32 warning for the current commits. The error is related to Failed: DID NOT RAISE <class 'ValueError'>
.
The test is now not raising errors because of the additional check_X_y
. It returns nd.array back even you put the list in. So now, the function test_api_input
in test_pyglmnet.py
below won't raise an error:
with pytest.raises(ValueError):
glm.fit(X, list(y))
And the fail in test_random_state_consistency
is coming from the commented out of self._allow_refit = False
after fit
method. Specifically, it comes from
match = "This glm object has already been fit"
with pytest.raises(ValueError, match=match):
ypred_c = glm_b.fit_predict(Xtrain, ytrain)
Alright @jasmainak, I think this should be the push using sklearn dependency. If you can help a bit with _allow_refit
logic, that should be it! I will leave it to you for now.
Merging #364 into master will decrease coverage by
0.88%
. The diff coverage is90.9%
.
@@ Coverage Diff @@
## master #364 +/- ##
==========================================
- Coverage 75.66% 74.78% -0.89%
==========================================
Files 4 4
Lines 678 686 +8
Branches 149 148 -1
==========================================
Hits 513 513
- Misses 128 133 +5
- Partials 37 40 +3
Impacted Files | Coverage Δ | |
---|---|---|
pyglmnet/pyglmnet.py | 79.71% <90.9%> (-1.54%) |
:arrow_down: |
pyglmnet/base.py | 48.38% <0%> (+3.22%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 1a4b9c7...44e0658. Read the comment docs.
sorry I am slow with responding. It has been crazy busy here.
This is impressive progress! Can we try to do away with the sklearn dependency now that all the tests pass? Basically copy the file that is needed into an externals folder and try to make it work? I think if we copy this file it might just work. And we already have BaseEstimator
in base.py
.
I'm fine with ditching the allow refit logic if it helps in making our software more sklearn
compatible.
@jasmainak I'll follow up by tomorrow!
I think using BaseEstimator
from scikit-learn allows you to use some of the functionalities e.g. get_params
, set_params
which can be used in Pipeline
which current base.py
cannot do.
I'm not quite sure about copying the file that you mentioned. Can you explain a bit more?
Basically instead of doing:
from sklearn.base import BaseEstimator
from sklearn.utils import check_random_state
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
you would do:
from .externals.sklearn.base import BaseEstimator
from .externals.sklearn.utils import check_random_state
from .externals.sklearn.utils.validation import check_X_y, check_array, check_is_fitted
hopefully it doesn't require you to copy too many files. As long as it's less than 5 files, I think we're fine.
Also you can add the get_params
and set_params
method from sklearn in our current base.py
. I think I had copied what we have from MNE which had a whittled down version of what sklearn provides.
@jasmainak, I did add sklearn externals and also switch the base back to the one from base.py
. Not sure if this is what you're expecting. Can you take over from here?
This is great!! awesome work @titipata :-) Happy to take over from here. Can you update whats_new.rst
so we can acknowledge this in the next release?
@titipata you are on fire. I don't think I need to do a pass really. If you just address my last couple of comments (remove the commented test), move the line in whats_new.rst
and update the title from WIP to MRG, I am happy to merge this PR once both CIs become green. Thank you so much for the efforts!
Awesome!! That sounds great to me! We might have to write some more test scripts later on :)
Yes I am looking forward to more contributions from you in the future!
@jasmainak, so I added the workaround for predict_proba
error when using check_estimator
. This is similar to the logic in sklearn's LogisticRegression
.
Good to go from your end @titipata ? Everything resolved and works? Please set the PR title to MRG if so!
@jasmainak yes, I guess you can help polish a bit and that should be good to go!
I'll merge @titipata when the CIs are green. We're good to go!
Perfect 👍 @jasmainak. We can also poke JOSS editor right after. This PR should take care their concerns!
yep that's the plan! :)
All good, thanks a ton @titipata !
So far, there are 3 main parts which can be fixed in
GLM
class to be compatible with scikit-learn'scheck_estimator
1) Initial attributes 2)
fit_predict
function 3) scipy'sexpit
closes https://github.com/glm-tools/pyglmnet/issues/363