Open amirhosseindavoody opened 5 years ago
Hi @amirhosseindavoody, I'm glad you reported this issue. I'm happy to investigate this and come up with a better solution. To confirm, you are using WeibullFitter.fit_interval_censoring
, correct?
For your first example, the following Python code and R code agree:
wf.fit_interval_censoring(np.array([1, 10, 100]), np.array([10, 100, 1000]), event_observed=np.array([0, 0, 0]))
wf.print_summary()
library(icenReg)
ic_par(cbind(c(1, 10, 100), c(10, 100, 1000)) ~ 1)
Do you see lifeline's behaviour being something else, or JMP giving different values?
Hello @amirhosseindavoody, have you had a chance to investigate this more? Can you share a real life dataset that has convergence problems?
Hi @CamDavidsonPilon , I have been very busy lately. I will find a real life dataset and post it here this week.
Just a thought: the issue may be statistical rather than numerical. That is, censored data models can suffer from the same issue as logistic regression when there is perfect separation in the data; the MLE of coefficients will be unbounded, although the optimization algorithm will converge with finite values that might not seem huge to some users, i.e., 20, since the change in log-likelihood may be numerically 0 at this point. A specific issue that a Weibull model will face is that the scale/rate parameter will go to 0/infinity at the MLE, and, like the coefficients, will never make it there.
From an algorithm's perspective, it's not so easy to detect this and it's usually considered a responsibility of the user (i.e., as is the case in glm in base R) to investigate if this could be an issue. Without seeing the data, that would be my first guess about what is going on.
Also a very likely scenario, @pistacliffcho, thanks for mentioning it. For future users, one can turn on show_progress=True
in the call to fit
and it will display a small summary of convergence details - including if the maximum number of iterations has been exceeded (which is what I suspect would happen in the perfect separation case).
Okay, this is kind of over due and I have finally got time to do some more detailed analysis. Here is an example data set and the results from JMP and lifelines. example_fail_data.txt
I have used scaling factor of 5000 for the start and stop times. The data include right censored and interval censored samples. Also, there are multiple samples (categories) and the Weibull analysis is performed for each category separately. Below you can see the alpha_JMP
versus alpha_lifelines
and beta_JMP
versus beta_lifelines
. The line shows the one-to-one correlation which we want in the ideal case (matching results between JMP and lifelines).
In my analysis, the JMP result is always better meaning they result in large log likelihood for each categories. I will show those results in another post.
@CamDavidsonPilon any ideas?
I have been testing this package to use this package as a replacement for JMP software to fit Weibull distribution on a bunch of interval censored data. The results does not seem to be stable. When I compare the results with JMP results, always the log likelihood is larger with JMP results. I have narrowed down the root cause to some extent.
There are two problems as far as I can tell:
[(1,10), (10, 100), (100, 1000)]
[(1, 10)]
. In this case, the LL function is a very flat function and it is hard to get a good convergence.To have a better solution (this definition is very subjective here), I have figured that it is better to optimize the
scale_factor
in log space, meaning usinglog(scale_factor)
as the optimization factor, and also use1/shape_factor
as the optimization factor.@CamDavidsonPilon I was wondering if you have any suggestions or thoughts on getting a more stable result in these cases.
Thanks,