glm-tools / pyglmnet

Python implementation of elastic-net regularized generalized linear models
http://glm-tools.github.io/pyglmnet/
MIT License
279 stars 84 forks source link

Pass distribution objects into `GLM` and `GLMCV` class #350

Open pavanramkumar opened 4 years ago

pavanramkumar commented 4 years ago

Motivation: as we expand the scope of models supported, it may be easier for contributors to add new distributions if we refactor the class to accept distribution-specific params and corresponding likelihood / loss / gradients / hessians as callables.

One strategy inspired by statsmodels proposed by @jasmainak in #274 is as follows:

distr = NegativeBinomial(n_failure)
GLM(distr=distr)
class MyDistribution(BaseDistribution):
    def loglikelihood(self):
        pass
    def loglik_gradient(self):
       pass
   def loglik_hessian(self):
       pass

There are also alternatives such as the scikit-learn approach which is less work for the low touch user who only needs to call fit and predict on a model object without having to construct a distr object:

model = PoissonGLM(base_param1, base_param2, poisson_param1, poisson_param2)
class PoissonGLM(GLM):
    def loglikelihood(self):
        pass
    def loglik_gradient(self):
       pass
   def loglik_hessian(self):
       pass
jasmainak commented 4 years ago

I wouldn't change the class signature too much any more. We made one big backwards incompatible change with this release (GLM and GLMCV). Now we should try to keep the API a little stable. Not sure I understand the second approach fully. In the first approach, you would not change anything in the GLM object except allow also objects that subclass from BaseDistribution. Users don't have to create them as you can just provide them in a module.

from pyglmnet.distributions import NegativeBinomial

distr = NegativeBinomial(n_failure)

then inside the GLM class you'd say:

if distr == 'negative_binomial':
    self.distr = NegativeBinomial()
elif distr == 'poisson':
   self.distr = Poisson()

etc. The subclassing simply ensures that all the distributions follow the same pattern with the same methods that need to be overridden.