coxph_fitter does not converge properly in certain cases

haleypats commented 6 years ago

I have been using lifelines to create an interface where users can select a set of variables to test with Cox proportional hazards regression. Partway through testing some combinations I encountered the following when viewing the summary table for some cases: cox_summary (hope the image loads!)

The standard error, z, p, and confidence interval columns all return as NaN. After a bit of investigating I traced the issue back to the fact that the diagonal values are negative in this case for line 307 of the coxph_fitter.py code

se = np.sqrt(inv(-self.hessian).diagonal()) / self._norm_std

A little bit more googling later I came across an explanation for a similar issue in R. The long and the short of it is that when the determinant of the Hessian is negative but the MLEM algorithm believes it has converged, it is in fact just trapped in a region where the objective function is flat to the same number of decimal points as the tolerances of the solver. I've calculated the Hessian determinant in these cases and can confirm that they are in fact negative when I encounter this issue.

I've found that adjusting the step-size to something smaller (like 0.5) can sometimes help get around this issue, but it is not a guaranteed work around. Have there been any other reports of this occurring or is there another solution I may not be aware of (such as changing the tolerance)?

For context my demo dataset is on the smaller side (~15 individuals), which may also be contributing to this issue. When I try using a colleague's dataset with 100+ individuals I do not encounter the same issue, instead sometimes running into the already reported delta NaN issue

CamDavidsonPilon commented 6 years ago

Thanks for the detailed issue. I haven't seen this particular issue before (though most often I have n >15), so this is great to see. The delta NaN is still a problem I'm trying to resolve (I have some ideas).

If you can send me the dataset of 15 individuals so I can reproduce locally, I would really appreciate that. I understand if you can't send it - but if so, my email is cam.davidson.pilon@gmail.com,

CamDavidsonPilon commented 6 years ago

👍 I'm able to reproduce this, so I'll work on a fix locally. Expect this in v0.14

CamDavidsonPilon commented 6 years ago

So after further investigation, I believe your issue may be related to a completely separated dataset. You can find out more here: https://pdfs.semanticscholar.org/4f17/1322108dff719da6aa0d354d5f73c9c474de.pdf

I've added new checks for complete separation in CoxPHFitter.

One thing I'm curious about, if you can report on it, is the output of the fit with show_progress turned to True.

haleypats commented 6 years ago

Thank you so much for your fast investigation. The fact that this error appears as a product of completely separated dataset makes sense, as when changing the step size I could avoid this specific error but often run into the singular matrix warning.

Unless I've misunderstood, this is the output when show_progress is set to true: convergence I'm sorry I couldn't reply to you sooner with it, I couldn't access my data on the weekend(similarly, I cannot provide the data as it must remain within the institution). If there is anything else I can do please let me know

CamDavidsonPilon commented 6 years ago

Thanks. So it looks like the delta is still very large, but the LL is converging to an asymptote. This can only happen if the original LL is not defined well (ex: think of a plateau (this case) vs a mountain (normal case)). So I strongly suspect complete separation.

Another check: if you add a small penalizer, like cp = CoxPHFitter(penalizer=0.01) - this will probably make it converge just fine.

CamDavidsonPilon / lifelines

coxph_fitter does not converge properly in certain cases #412