CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.37k stars 560 forks source link

Kaplan-Meier plot_loglogs difference from docs #1033

Closed sean-reed closed 4 years ago

sean-reed commented 4 years ago

The docstring says this function gives a plot of log(S(t)) against log(t) but it actually produces a plot of log(-log(S(t))) (correctly labelled on y-axis) against t (incorrectly labelled as log(t)) with a linear-scaled y-axis and log-scaled x-axis.

I can submit a fix for this but, assuming the intention of this function is to produce a straight line plot for Weibull distributed survival times, there are various solutions:

(a) Plot -log(S(t)) against t, with log-scaled x and y axes. (b) Plot -log(S(t)) against log(t), with linear-scaled x-axis and log-scaled y-axis. (c) Plot log(-log(S(t)) against log(t), with linear-scaled x and y axes.

My suggestion is solution (b) as it's closest to the current function name and docstring.

CamDavidsonPilon commented 4 years ago

The goal was to check for proportional hazards easily. Thinking out loud: why I am plotting against log(t) anyways? I think that's a mistake tbh. What do you think?

sean-reed commented 4 years ago

For checking proportional hazards I don't think it matters too much as the curves will be parallel either way if the assumption holds since it's just a different scaling of time. I think it's still useful to plot either log(t) on a linear axis or t on log axis (as it is now) though, as then Weibull distributed survival times should appear as an approximately straight line.

sean-reed commented 4 years ago

On that second point, may also be useful to have the option to have the confidence intervals included in the plot? I don't think it's an option at the moment? Idea being that if it's possible to draw a straight line within the confidence interval, then it's consistent with Weibull distributed.

CamDavidsonPilon commented 4 years ago

For checking proportional hazards I don't think it matters too much as the curves will be parallel either way if the assumption holds since it's just a different scaling of time.

Yes that's true.

The whole MPL-doing-log-axes is just confusing, my suggestion would be to put both on a linear axes and plot log(-log(S)) against log(t), labelled as such. Does that make sense?

sean-reed commented 4 years ago

Yes, sounds good to me.