Log likelihood needs to be maximized or minimized?

Genarito commented 11 months ago

Hi, first of all, thank you very much for this library! :100:

I'm reading the Log Likelihood documentation and it says:

The in-sample log-likelihood is available under loglikelihood of any regression model. For out-of-sample data, the score() method (available on all regression models) can be used. This returns the average evaluation of the out-of-sample log-likelihood. We want to maximize this.

But I'm looking that, in the example below, there is a comment better model that corresponds to the lowest (not highest) value:

from lifelines import CoxPHFitter
from lifelines.datasets import load_rossi

rossi = load_rossi().sample(frac=1.0)
train_rossi = rossi.iloc[:400]
test_rossi = rossi.iloc[400:]

cph_l2 = CoxPHFitter(penalizer=0.1, l1_ratio=0.).fit(train_rossi, 'week', 'arrest')
cph_l1 = CoxPHFitter(penalizer=0.1, l1_ratio=1.).fit(train_rossi, 'week', 'arrest')

res_l2 = cph_l2.score(test_rossi)
res_l1 = cph_l1.score(test_rossi)

print(res_l2)
print(res_l1) # better model

print(res_l2 < res_l1)  # Prints False, so the model with l1_ratio=1 is lower, it's not the maximum

Is this an error in the code example or should the log-likelihood be minimized instead of maximized?

I really appreciate any help you can provide

CamDavidsonPilon commented 11 months ago

Definitely we maximize the ll. The code snippet is likely wrong then. I think I was thinking "score" should be minimized, but that's not true.

Genarito commented 11 months ago

Hi @CamDavidsonPilon , in this blog they're minimizing it too, maybe the snippet is well, but the documentation is not talking about the negative partial log-likelihood (it's talking about the positive one).

UPDATE Here it's minimizing too...

pzivich commented 11 months ago

In general, SciPy and other libraries with optimization algorithms focus is on finding the minimum. In maximum likelihood, we instead are maximizing the likelihood (or equivalently the log-likelihood). To implement MLE with those other libraries, we instead minimize the negative log-likelihood, which is equivalent to maximizing the log-likelihood. @CamDavidsonPilon would be able to confirm this, but I suspect the negative log-likelihood is being minimized behind the scenes.

The score function is the derivative of the log-likelihood. To find the maximum log-likelihood, we find the place where the score is zero (slope at a max is zero). Here, you could use a root-finding algorithm instead to solve (so we have two ways to find the point estimates).

CamDavidsonPilon commented 11 months ago

but I suspect the negative log-likelihood is being minimized behind the scenes.

yea, min the neg ll or max the ll are equivalent. Internally we min the neg ll, but expose the ll to users via the score function (btw @pzivich, the score function on lifelines model mimics the score of scikit learn models, and isn't the same as the derivative of the log-likelihood - confusing I know!)

Genarito commented 11 months ago

Sorry, but I am a bit lost. In the example code I passed, the values returned by score are both negative, if that function is returning log-likelihood then how can it be distinguished from negative log-likelihood?

CamDavidsonPilon commented 11 months ago

I mean, a log-likelihood is probably going to be negative: it's the log of values between 0 and 1. When we discuss a neg log-likelihood vs log-likelihood, we are really talking about the shape of the log-likelihood surface (bowl shaped vs hill shaped).

Genarito commented 11 months ago

I understand, thank you both! So the only change that should be made to the code snippet in the documentation is to change the comment # better model to the cph_l2 model line. If you want I can do a PR so as not to add work for you.

Regards

CamDavidsonPilon commented 11 months ago

Feel free to send a PR!

CamDavidsonPilon / lifelines

Log likelihood needs to be maximized or minimized? #1545