python Vs R results, - Githubissues

kputta commented 4 years ago

Hi,

For some reason the results from R have different cvsd compared to results from this python package. Do I need to do something else to match lambda.1se ?

If you see in below plot py_cvm matches with cvm ( coming from R ) but that is not the case with cvlo and cvup because of different cvsd and that effects lambda.1se choice ?

I checked couple of time on my end and I dont see anything, am I missing something obvious ?

bbalasub1 commented 4 years ago

That does not seem right.

How many rows of data are you using? My guess is there is a sqrt(n) missing in one of the two.

On Wednesday, December 4, 2019, Kaushik Putta notifications@github.com wrote:

Hi,

For some reason the results from R have different cvsd compared to results from this python package. Do I need to do something else to match lambda.1se ?

If you see in below plot py_cvm matches with cvm ( coming from R ) but that is not the case with cvlo and cvup because of different cvsd and that effects lambda.1se choice ?

I checked couple of time on my end and I dont see anything, am I missing something obvious ?

[image: image] https://user-images.githubusercontent.com/9323468/70184670-4ecae480-16b6-11ea-8342-a1c04e7b7661.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bbalasub1/glmnet_python/issues/45?email_source=notifications&email_token=ACVEGE47YZNADHVWJJ2QXPTQXARMLA5CNFSM4JVPOTOKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H6EJIGA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVEGE36LK2MWF7OJI7LJIDQXARMLANCNFSM4JVPOTOA .

kputta commented 4 years ago

I am using 4286397 X 38

and versions:

python: 3.7.2 scipy: 1.2.1 numpy: 1.16.3 pandas: 0.24.2

bbalasub1 commented 4 years ago

Thanks. What nfolds are you using?

On Wednesday, December 4, 2019, Kaushik Putta notifications@github.com wrote:

I am using 4286397 X 38

and versions:

python: 3.7.2 scipy: 1.2.1 numpy: 1.16.3 pandas: 0.24.2

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bbalasub1/glmnet_python/issues/45?email_source=notifications&email_token=ACVEGE5Q7C55R4I7I4NMWBDQXASRPA5CNFSM4JVPOTOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6U66A#issuecomment-561860472, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVEGEYP7D6RJKFACK6AY6TQXASRPANCNFSM4JVPOTOA .

kputta commented 4 years ago

10

Best, Kaushik

On Dec 4, 2019, at 5:24 PM, bbalasub1 notifications@github.com wrote:

Thanks. What nfolds are you using?

On Wednesday, December 4, 2019, Kaushik Putta notifications@github.com wrote:

I am using 4286397 X 38

and versions:

python: 3.7.2 scipy: 1.2.1 numpy: 1.16.3 pandas: 0.24.2

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bbalasub1/glmnet_python/issues/45?email_source=notifications&email_token=ACVEGE5Q7C55R4I7I4NMWBDQXASRPA5CNFSM4JVPOTOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6U66A#issuecomment-561860472, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVEGEYP7D6RJKFACK6AY6TQXASRPANCNFSM4JVPOTOA .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

bbalasub1 commented 4 years ago

Which model are you using, and are you using grouped=True? I have not looked at it completely yet -- but they all seemed to be coded with the right spirit (with sqrt(n)). The possible classes are: elnet, lognet, multnet, mrelnet, fishnet.

kputta commented 4 years ago

I am using mostly default args: grouped = True

infact just this line: cvfit = cvglmnet(x = X_train.values, y = y_train.values, ptype = 'mse', nfolds=10)

bbalasub1 commented 4 years ago

There was an overwrite of the weight values passed into the Fortran routines by glmnet.py that was causing this. Instead of passing by reference, I am now copying the weights before calling the Fortran solver. This seems to fix it, and I am getting a match between R and python versions.

Please do a fresh install (or just copy the new glmnet.py to your local installation), verify(*) and let me know if you find any further problems.

For verification, I matched the foldid between R and python versions prior to the cross-validation step and compared the cvup, cvm, and cvlo. Since R uses a random permutation every time and foldid changes, you would need to hard code the foldid into both codes and match them. Also python uses a 0 index while R uses a 1 index -- so ensure that you subtract 1 from python foldid when you hard code it!

kputta commented 4 years ago

Thanks

bbalasub1 / glmnet_python

python Vs R results, #45