glm-tools / pyglmnet

Python implementation of elastic-net regularized generalized linear models
http://glm-tools.github.io/pyglmnet/
MIT License
279 stars 84 forks source link

[MRG] Refactor log likelihood. #217

Closed jasmainak closed 7 years ago

jasmainak commented 7 years ago

This is built on top of https://github.com/glm-tools/pyglmnet/pull/216.

@pavanramkumar there are some differences between the two functions that I don't understand. Perhaps you could explain to me over chat

codecov-io commented 7 years ago

Codecov Report

Merging #217 into master will decrease coverage by 0.03%. The diff coverage is 98.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #217      +/-   ##
==========================================
- Coverage   77.91%   77.87%   -0.04%     
==========================================
  Files           4        4              
  Lines         498      470      -28     
  Branches       92       88       -4     
==========================================
- Hits          388      366      -22     
+ Misses         77       72       -5     
+ Partials       33       32       -1
Impacted Files Coverage Δ
pyglmnet/utils.py 32.55% <ø> (-12.06%) :arrow_down:
pyglmnet/metrics.py 100% <100%> (ø) :arrow_up:
pyglmnet/pyglmnet.py 81.34% <98.03%> (-0.52%) :arrow_down:
doc/_themes/sphinx_rtd_theme/__init__.py 100% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e7ff601...c3fa627. Read the comment docs.

pavanramkumar commented 7 years ago

@jasmainak the _logL() within pyglmnet.py is called by _loss() which is only used to track the loss and determine convergence. the _logL() within utils.py is used for scoring the model.

In the former case, it makes sense to think of the log likelihood as a function of the paremeters (beta0, beta) since that is what you are optimizing wrt to. In the latter case, we want to score the prediction against the test data. Here, you want to separate the scoring from the estimation as much as possible. There are also numerical issues such as adding an eps parameter to make sure we don't get -Inf or NaN.

Hence, the difference in API of the two functions. Let me know your suggestions to DRY them.

jasmainak commented 7 years ago

hmm ... I was thinking of abstracting away the likelihood as something that is a property of a distribution. The loss function that we have of course takes the arguments beta as that is how we parametrize it.

I'm also thinking in terms of trivializing how future contributors can add a new distribution. It should be as simple as updating the log likelihood and it's derivative. The link function and rest should all be taken care of automatically because of our API design.

Worst case, we can add a parameter (mode='scoring' for example) that chooses the code path depending on weather we are scoring or doing the loss function.

jasmainak commented 7 years ago

ready for merge from my end!

jasmainak commented 7 years ago

ping @pavanramkumar . Did you get a chance to look at the diff?

pavanramkumar commented 7 years ago

@jasmainak i really like what you did with the chain rule. barring a few small issues i pointed out, it LGTM! thanks for looking into this carefully!

jasmainak commented 7 years ago

@pavanramkumar let me know about log likelihood. The other comment is addressed

jasmainak commented 7 years ago

@pavanramkumar I removed log_likelihood for now. If users request it later, it's cheap to add it back. The other comment is addressed

pavanramkumar commented 7 years ago

i'm ok with log_likelihood but check out my comment regarding the underflow prevention for binomial log likelihood (binomial cross entropy). it's a special case of the well known log-sum-exp trick, see e.g.: https://hips.seas.harvard.edu/blog/2013/01/09/computing-log-sum-exp/ (hint: interview question!)

pavanramkumar commented 7 years ago

I will merge this PR now!