Closed themantalope closed 4 years ago
there are four different variants of Poisson regression -- all for count distributions:
All methods are implemented in R but in different packages. This paper and corresponding code examples give you a sense of the applications.
Thanks! I'll start taking a look.
On Thu, Nov 17, 2016 at 2:15 PM, Pavan Ramkumar notifications@github.com wrote:
there are four different variants of Poisson regression -- all for count distributions:
- Quasi-Poisson (which allows variance to be proportional rather than equal to the mean)
- Zero-inflated Poisson (to allow for more zeros than account for by a Poisson distribution)
- Zero-truncated and hurdle models (another way to deal with more than expected zero observations)
- Negative binomial
All methods are implemented in R but in different packages. This paper http://www.jstatsoft.org/v27/i08/paper and corresponding code examples http://data.princeton.edu/wws509/r/overdispersion.html give you a sense of the applications.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/glm-tools/pyglmnet/issues/163#issuecomment-261356854, or mute the thread https://github.com/notifications/unsubscribe-auth/AHP3B12vGAK-ZwduKcPjuey0Az4P3chgks5q_LXKgaJpZM4KVQW0 .
Hi there! I'm reviving this issue because I wanted to know if this feature is planned to be released in the future (or, better, if somebody is currently developing it) or not. I am currently working on a project which requires it and it would be really nice to know that it will be available in your framework :)
@geektoni which of the link functions are you most interested in? Unfortunately, I'm crazy busy these days. However, I am happy to guide and review a pull request if someone has the motivation to implement it :)
@jasmainak I was interested in the Negative Binomial. If nobody is working on it at the moment I think I'll give it a try to add it as a new noise model. This is not quite my field, but let's see what happens. Apart from the paper and code examples outlined above, are there any other suggested references I should know before starting designing/implementing it? :)
not me, @pavanramkumar is the expert on GLMs. Maybe he can suggest something :)
What I would suggest is to first write down the equations in the cheatsheet. Once you have the log likelihood and the gradients, it should be straightforward to plug them into pyglmnet
. You can first update the tests for the gradients and then write code to make the tests pass. Do not hesitate to let us know if you get stuck.
@geektoni thanks for taking a stab at it! what @jasmainak suggests is probably the right approach: deriving the log likelihood and writing the gradient tests would be a start.
the paper linked above is a starting point but i'll add any other references that may be appropriate.
to clarify: i hope you're looking at neg-binomial with a fixed shape parameter -- this is a special case of a GLM, and can likely be implemented in the same GLM()
class.
however, if you want to estimate both mean and shape parameters, you would have to take an iterative approach: estimating mean for a fixed shape, then shape for a fixed mean, and so on until convergence. for this more general case, i would recommend a different class.
@pavanramkumar Yes, I am just interested in fixed shape negative binomial (the more complex case could happen later on). Regarding the implementation, I have a few questions:
@geektoni I will answer point 2 -- maybe @pavanramkumar can shed some light on point 1.
You need to add your log-likelihood (without the regularization term -- i.e., no L2) here. The loss will be computed from this. Then you have to update this and this for the gradient and hessian respectively.
Feel free to make a WIP (work in progress) PR so we can help early on rather than wait to have perfect code ...
@geektoni I will answer point 2 -- maybe @pavanramkumar can shed some light on point 1.
You need to add your log-likelihood (without the regularization term -- i.e., no L2) here. The loss will be computed from this. Then you have to update this and this for the gradient and hessian respectively.
Feel free to make a WIP (work in progress) PR so we can help early on rather than wait to have perfect code ...
Ok, that makes sense. I'll try to push something in the next days so you can have a look at it.
Just putting this up as a potential enhancement.