Closed pavanramkumar closed 5 years ago
I think we should just support >=0.18 for scikit-learn. So, you can change everything to use modelselection
module. Don't merge this yet. I will test it tomorrow. I think it's a little less trivial than this ...
Merging #263 into master will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #263 +/- ##
=======================================
Coverage 72.97% 72.97%
=======================================
Files 4 4
Lines 618 618
Branches 125 125
=======================================
Hits 451 451
- Misses 127 128 +1
+ Partials 40 39 -1
Impacted Files | Coverage Δ | |
---|---|---|
pyglmnet/pyglmnet.py | 78.68% <0%> (ø) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update a7f3f2c...b03b979. Read the comment docs.
@jasmainak sure!
right now, we have a check_version
function in our base.py
and decide import based on that. we can significantly simplify this if we decide to support >=0.18
. i'll wait for you to sort out those issues before merging.
@pavanramkumar so I had a look. You also need to update the examples so that they use modern sklearn
API
@jasmainak pushed example changes. so no change needed to check_version
in base.py
and its usage in pyglmnet.py
?
@pavanramkumar the CircleCI build was not triggered for some reason ...
so no change needed to check_version in base.py and its usage in pyglmnet.py?
I think we can leave it as such
@pavanramkumar I get the following error when I build the example:
/home/mainak/Desktop/projects/github_repos/pyglmnet/pyglmnet/pyglmnet.py:575: RuntimeWarning: invalid value encountered in greater
(np.abs(beta) > thresh)
WARNING: /home/mainak/Desktop/projects/github_repos/pyglmnet/examples/plot_community_crime.py failed to execute correctly:Traceback (most recent call last):
File "/home/mainak/anaconda2/lib/python2.7/site-packages/sphinx_gallery/gen_rst.py", line 450, in execute_code_block
exec(code_block, example_globals)
File "<string>", line 14, in <module>
File "/home/mainak/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 639, in fit
cv.split(X, y, groups)))
File "/home/mainak/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/_split.py", line 332, in split
for train, test in super(_BaseKFold, self).split(X, y, groups):
File "/home/mainak/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/_split.py", line 95, in split
for test_index in self._iter_test_masks(X, y, groups):
File "/home/mainak/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/_split.py", line 634, in _iter_test_masks
test_folds = self._make_test_folds(X, y)
File "/home/mainak/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/_split.py", line 589, in _make_test_folds
allowed_target_types, type_of_target_y))
ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.
@jasmainak where did you build this example? locally or on circleCI?
@pavanramkumar locally because as you can see CircleCI was not triggered for this pull request ...
closes #267 and some circle CI issues. expect circle CI to break here because of the community crime dataset issue
Appendix 0 here gives some insights: https://web.stanford.edu/~hastie/Papers/Glmnet_Vignette.pdf
TLDR is: set convergence threshold on change in predicted target variable
@pavanramkumar I can't test things on this branch because examples are broken. Which example do you suggest I try? It seems the community crime data has a broken URL. Should we upload the data on dropbox or something? I don't know about license ...
@pavanramkumar I ran the group lass example with a slight modification:
from pyglmnet import GLM
from pyglmnet.datasets import fetch_group_lasso_datasets
import matplotlib.pyplot as plt
df, group_idxs = fetch_group_lasso_datasets()
print(df.head())
from sklearn.model_selection import train_test_split # noqa
X = df[df.columns.difference(["Label"])].values
y = df.loc[:, "Label"].values
Xtrain, Xtest, ytrain, ytest = \
train_test_split(X, y, test_size=0.2, random_state=42)
from pyglmnet import _loss
import numpy as np
loss_trace = list()
distr = 'binomial'
alpha = 1.0
group = group_idxs
reg_lambda = 1e-3
def callback(beta):
Tau = None
eta = 2.0
loss_trace.append(
_loss(distr, alpha, Tau, reg_lambda,
Xtrain, ytrain, eta, np.array(group), beta))
# set up the group lasso GLM model
gl_glm = GLM(distr=distr, tol=1e-7, max_iter=1000,
group=group_idxs, score_metric="pseudo_R2",
alpha=alpha, reg_lambda=reg_lambda,
callback=callback)
gl_glm.fit(Xtrain, ytrain)
plt.plot(loss_trace)
plt.show()
I got this plot:
Looking at this, I would say it should not stop before 600 iterations. But when I run it with the default tol=1e-3
it stops at 40 iterations (!) That is too early ...
@jasmainak let's break up this PR into 3 PRs:
Appendix 0 here gives some insights: https://web.stanford.edu/~hastie/Papers/Glmnet_Vignette.pdf
This looks like a great resource! And the documentation is pretty neat
@pavanramkumar that sounds like a plan! Do you want to remove that last commit by interactive rebase? (Make sure to make a backup of the branch so that you can cherry-pick the commit later) Or if you need my help for git kung-fu let me know :)
Merging #263 into master will decrease coverage by
0.76%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #263 +/- ##
==========================================
- Coverage 73.74% 72.97% -0.77%
==========================================
Files 4 4
Lines 617 618 +1
Branches 124 125 +1
==========================================
- Hits 455 451 -4
- Misses 123 128 +5
Partials 39 39
Impacted Files | Coverage Δ | |
---|---|---|
pyglmnet/pyglmnet.py | 78.68% <0%> (-0.99%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5c97535...ca45b24. Read the comment docs.
Merging #263 into master will decrease coverage by
0.76%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #263 +/- ##
==========================================
- Coverage 73.74% 72.97% -0.77%
==========================================
Files 4 4
Lines 617 618 +1
Branches 124 125 +1
==========================================
- Hits 455 451 -4
- Misses 123 128 +5
Partials 39 39
Impacted Files | Coverage Δ | |
---|---|---|
pyglmnet/pyglmnet.py | 78.68% <0%> (-0.99%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5c97535...fce952e. Read the comment docs.
@pavanramkumar rebasing done. Let's wait for the CIs to be happy.
I pushed a backup of your branch here: https://github.com/jasmainak/pyglmnet/tree/deprecations_backup
okay PR is good to go from my end @pavanramkumar . Merging so we can get some dopamine in our brains and then work on the rest in other PRs :)
We only test for scikit-learn 0.18, scipy 0.17, numpy 1.11 (see
.travis.yml
)I think for the scope of our package, that is ok, given it's been quite a long time since those versions have come out.
We shouldn't strive for backward compatibility when it makes support and testing painful.
Closes #69 and #239