glm-tools / pyglmnet

Python implementation of elastic-net regularized generalized linear models
http://glm-tools.github.io/pyglmnet/
MIT License
279 stars 82 forks source link

Poisson Regression Same Results in R as in Python? #376

Open gaussianZeroOne opened 4 years ago

gaussianZeroOne commented 4 years ago

I am trying to implement the same code from R using glmnet and in Python. The issue is that I am using Poisson regression.

In R, I want to run the following regression. A vector y on 3 variables. It is a simple example but I want to keep it simple to easily compare results in Python:

library(glmnet)
y= c(2.631007e-13, 2.953114e-09, 1.537151e-03)

covar = diag(3)

mod <- glmnet(x=covar, y=y, family="poisson")

The above produces the following coefficients: [1] 0.000000 0.000000 7.557667. Only the 3rd variable has any weight from the penalization.

In Python, using this package:

from pyglmnet import GLM

import numpy as np

glm = GLM(alpha=1, distr='poisson', tol=1e-07)

print(glm.fit(np.eye(3),np.array([2.631007e-13, 2.953114e-09, 1.537151e-03])).beta_)

This gives me [0,0,0] for the 3 coefficients.

I assume I may need to use K-Fold validation, and also search across various lambda (penalization) parameters. However, I am unsure how exactly to do this and if this is actually a feature of pyglmnet?

Has anyone been able to figure out how to run glmnet with poisson in Python to produce exactly the same results?

jasmainak commented 4 years ago

yes it is! You should use GLMCV to search across various lambda. Can you make sure that these are matched between R and Python?

gaussianZeroOne commented 4 years ago

Thank you @jasmainak.

I've tried the following in pyglmnet:

from pyglmnet import GLMCV

glm = GLMCV(alpha=1, distr='poisson', tol=1e-07, cv=2)

print(glm.fit(np.eye(3),np.array([2.631007e-13, 2.953114e-09, 1.537151e-03])).beta_)

However it produces [0, 0, 0]. I am unable to tweak the pyglmnet to produce the exact same code as in R's glmnet.

Could it be that the lambda sequences are different? I noticed in R's glmneet it produces the sequence of coefficients through each iteration. Is there a way to do this as well in pyglmnet to see if the differences are due to different starting points?

jasmainak commented 4 years ago

@gaussianZeroOne could you also provide the R code that you have tried? I am not that familiar with R but I think @titipata might be able to help.

titipata commented 4 years ago

@jasmainak, I edited the code above which should have the R code to reproduce.

titipata commented 4 years ago

And yes, it seems like there is something going on with the optimizer. I tried the same example and couldn't get the same result as in R. Maybe @pavanramkumar knows more in detail why it does not give the same result in this case.

jasmainak commented 4 years ago

cool thanks, I'll take a look tomorrow!