chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
55 stars 27 forks source link

flexsurvreg () using dist = "weibull". Error: initial value in 'vmmin' is not finite #169

Closed shakirarodzlan closed 1 year ago

shakirarodzlan commented 1 year ago

Hi, I received this error after run flexsurvreg () using dist = "weibull".

wei_mod <- flexsurvreg(Surv(survival_time, status_pmcvd == "premature cvd death") ~ sex + age_entry + ethnic + education_level + dm_status + hpt_status + chol_status + abdominal_obesity + current_smoker, data = pmcvd2, dist = "weibull") print(wei_mod) Error in optim(method = "BFGS", par = c(shape = 276.244558817693, scale = 26.0046238740812, : initial value in 'vmmin' is not finite

I tried add initial = 0, but I still got an error.

wei_mod <- flexsurvreg(Surv(survival_time, status_pmcvd == "premature cvd death") ~ sex + age_entry + ethnic + education_level + dm_status + hpt_status + chol_status + abdominal_obesity + current_smoker, data = pmcvd2, dist = "weibull", inits = '0') print(wei_mod)

Error in flexsurvreg(Surv(survival_time, status_pmcvd == "premature cvd death") ~ : Initial values for parameters 1,2 out of range

However, I manage to run other distribution options using the same code, e.g., exponential, Gompertz, log normal, and log logistic.

Example from Gompertz survival model;

gomp_mod <- flexsurvreg(Surv(survival_time, status_pmcvd == "premature cvd death") ~ sex + age_entry + ethnic + education_level +dm_status + hpt_status + chol_status + abdominal_obesity + current_smoker, data = pmcvd2, dist = "gompertz") print(gomp_mod) Call: flexsurvreg(formula = Surv(survival_time, status_pmcvd == "premature cvd death") ~ sex + age_entry + ethnic + education_level + dm_status + hpt_status + chol_status + abdominal_obesity + current_smoker, data = pmcvd2, dist = "gompertz")

Estimates: data mean est L95% U95% se exp(est) L95%
shape NA 8.40e-02 6.72e-02 1.01e-01 8.60e-03 NA NA rate NA 1.77e-05 1.01e-05 3.11e-05 5.10e-06 NA NA sexMale 4.64e-01 8.39e-01 6.53e-01 1.02e+00 9.46e-02 2.31e+00 1.92e+00 age_entry 4.07e+01 3.70e-02 2.95e-02 4.46e-02 3.84e-03 1.04e+00 1.03e+00 ethnicIndian 8.25e-02 5.71e-01 2.71e-01 8.71e-01 1.53e-01 1.77e+00 1.31e+00 ethnicMalay 5.94e-01 5.62e-01 3.41e-01 7.82e-01 1.12e-01 1.75e+00 1.41e+00 ethnicOther Bumiputeras 1.08e-01 4.23e-01 1.05e-01 7.41e-01 1.62e-01 1.53e+00 1.11e+00 ethnicOthers 2.93e-02 -2.00e-02 -7.05e-01 6.65e-01 3.49e-01 9.80e-01 4.94e-01 education_levelPrimary 2.46e-01 2.09e-01 -7.50e-02 4.93e-01 1.45e-01 1.23e+00 9.28e-01 education_levelSecondary 5.15e-01 3.85e-02 -2.64e-01 3.41e-01 1.54e-01 1.04e+00 7.68e-01 education_levelTertiary 1.77e-01 -5.77e-01 -1.00e+00 -1.53e-01 2.16e-01 5.62e-01 3.68e-01 dm_statusDiagnosed DM 7.68e-02 1.19e+00 1.00e+00 1.38e+00 9.60e-02 3.29e+00 2.72e+00 dm_statusUndiagnosed DM 7.14e-02 4.40e-01 1.86e-01 6.95e-01 1.30e-01 1.55e+00 1.20e+00 hpt_statusDiagnosed HPT 1.35e-01 8.34e-01 6.23e-01 1.05e+00 1.08e-01 2.30e+00 1.86e+00 hpt_statusUndiagnosed HPT 2.24e-01 5.49e-01 3.59e-01 7.39e-01 9.69e-02 1.73e+00 1.43e+00 chol_statusDiagnosed chol 7.29e-02 2.58e-01 3.10e-02 4.85e-01 1.16e-01 1.29e+00 1.03e+00 chol_statusUndiagnosed chol 2.90e-01 3.21e-01 1.53e-01 4.89e-01 8.56e-02 1.38e+00 1.17e+00 abdominal_obesityYes 4.71e-01 2.60e-01 9.68e-02 4.23e-01 8.33e-02 1.30e+00 1.10e+00 current_smokerYes 2.36e-01 4.62e-01 2.78e-01 6.45e-01 9.35e-02 1.59e+00 1.32e+00 U95%
shape NA rate NA sexMale 2.78e+00 age_entry 1.05e+00 ethnicIndian 2.39e+00 ethnicMalay 2.19e+00 ethnicOther Bumiputeras 2.10e+00 ethnicOthers 1.94e+00 education_levelPrimary 1.64e+00 education_levelSecondary 1.41e+00 education_levelTertiary 8.58e-01 dm_statusDiagnosed DM 3.97e+00 dm_statusUndiagnosed DM 2.00e+00 hpt_statusDiagnosed HPT 2.85e+00 hpt_statusUndiagnosed HPT 2.09e+00 chol_statusDiagnosed chol 1.62e+00 chol_statusUndiagnosed chol 1.63e+00 abdominal_obesityYes 1.53e+00 current_smokerYes 1.91e+00

N = 54483, Events: 737, Censored: 53746 Total time at risk: 625686.1 Log-likelihood = -5162.153, df = 19 AIC = 10362.31

Kindly help me to encounter this error.

Thank you

chjackson commented 1 year ago

flexsurv fits the standard Weibull by wrapping survreg from the survival package. It looks like survreg successfully converged and produced estimates of the Weibull shape and scale (276 and 26) that are extreme, which leads to numerical overflow in flexsurvreg. I'd guess that the Weibull distribution just doesn't fit this dataset very well, or the shape parameter is not identifiable from the data (i.e many shape parameters would fit equally well so there is no "optimum").

shakirarodzlan commented 1 year ago

Dear chjackson,

Thank you for your prompt response.

I wanted to inform you that I explored an alternative package for running the Weibull model, and I am pleased to report that it provided successful results.

1) eha package

install.packages("eha")

library(eha)

weibull.mod <- weibreg(Surv(survival_time, status_pmcvd == "premature cvd death") ~ sex + age_entry + ethnic + education_level +dm_status + hpt_status + chol_status + abdominal_obesity + current_smoker, data = pmcvd2) weibull.mod

Call: weibreg(formula = Surv(survival_time, status_pmcvd == "premature cvd death") ~ sex + age_entry + ethnic + education_level + dm_status + hpt_status + chol_status + abdominal_obesity + current_smoker, data = pmcvd2)

Covariate Mean Coef Exp(Coef) se(Coef) Wald p sex Female 0.547 0 1 (reference) Male 0.453 0.832 2.298 0.095 0.000 age_entry 39.945 0.037 1.037 0.004 0.000 ethnic Chinese 0.196 0 1 (reference) Indian 0.084 0.570 1.768 0.153 0.000 Malay 0.583 0.558 1.747 0.112 0.000 Other Bumiputeras 0.113 0.416 1.515 0.162 0.011 Others 0.024 -0.038 0.963 0.350 0.914 education_level No formal education 0.062 0 1 (reference) Primary 0.246 0.210 1.234 0.145 0.148 Secondary 0.530 0.037 1.037 0.155 0.813 Tertiary 0.162 -0.592 0.553 0.216 0.006 dm_status Normal 0.872 0 1 (reference) Diagnosed DM 0.065 1.183 3.265 0.096 0.000 Undiagnosed DM 0.063 0.427 1.532 0.130 0.001 hpt_status Normal 0.650 0 1 (reference) Diagnosed HPT 0.125 0.836 2.306 0.108 0.000 Undiagnosed HPT 0.226 0.554 1.741 0.097 0.000 chol_status Normal 0.684 0 1 (reference) Diagnosed chol 0.062 0.243 1.275 0.116 0.036 Undiagnosed chol 0.254 0.310 1.363 0.085 0.000 abdominal_obesity No 0.549 0 1 (reference) Yes 0.451 0.256 1.291 0.083 0.002 current_smoker No 0.767 0 1 (reference) Yes 0.233 0.462 1.587 0.094 0.000

log(scale) 7.976 2911.544 0.252 0.000 log(shape) 0.355 1.426 0.033 0.000

Events
Total time at risk 625686 Max. log. likelihood -5159.5 LR test statistic 1043 Degrees of freedom 17 Overall p-value 0

2) survival package library(survival) wei.mod <-survreg(Surv(survival_time, status_pmcvd == "premature cvd death") ~ sex + age_entry + ethnic + education_level +dm_status + hpt_status + chol_status + abdominal_obesity + current_smoker, data = pmcvd2, dist='weibull') summary(wei.mod) Call: survreg(formula = Surv(survival_time, status_pmcvd == "premature cvd death") ~ sex + age_entry + ethnic + education_level + dm_status + hpt_status + chol_status + abdominal_obesity + current_smoker, data = pmcvd2, dist = "weibull") Value Std. Error z p (Intercept) 2.60e+01 8.39e-79 3.10e+79 <2e-16 sexMale 5.54e+00 3.07e-79 1.80e+79 <2e-16 age_entry 4.09e-01 1.18e-80 3.47e+79 <2e-16 ethnicIndian 3.37e+00 5.24e-79 6.43e+78 <2e-16 ethnicMalay 5.18e+00 3.34e-79 1.55e+79 <2e-16 ethnicOther Bumiputeras 1.37e+00 4.83e-79 2.84e+78 <2e-16 ethnicOthers 1.46e+01 7.85e-79 1.86e+79 <2e-16 education_levelPrimary -7.66e+00 5.70e-79 -1.35e+79 <2e-16 education_levelSecondary -8.49e+00 5.74e-79 -1.48e+79 <2e-16 education_levelTertiary -5.24e+00 6.36e-79 -8.23e+78 <2e-16 dm_statusDiagnosed DM 1.22e+01 5.27e-79 2.32e+79 <2e-16 dm_statusUndiagnosed DM 2.55e+00 4.91e-79 5.20e+78 <2e-16 hpt_statusDiagnosed HPT -3.24e+00 4.41e-79 -7.35e+78 <2e-16 hpt_statusUndiagnosed HPT -4.24e+00 3.26e-79 -1.30e+79 <2e-16 chol_statusDiagnosed chol 2.44e+00 5.35e-79 4.56e+78 <2e-16 chol_statusUndiagnosed chol 7.81e+00 2.90e-79 2.69e+79 <2e-16 abdominal_obesityYes 4.46e-01 2.73e-79 1.64e+78 <2e-16 current_smokerYes 3.84e-01 3.61e-79 1.06e+78 <2e-16 Log(scale) -2.76e+02 0.00e+00 -Inf <2e-16

Scale= 1.07e-120

Weibull distribution Loglik(model)= 54887.5 Loglik(intercept only)= -5680.7 Chisq= 121136.4 on 17 degrees of freedom, p= 0 Number of Newton-Raphson Iterations: 2 n= 54483

Could you please help me determine whether the issue is with my data or with the package being used?

I really appreciate this flexsurv package because the results it provides for different distributions (like exponential, Gompertz, log normal, and log logistic) are clear and easy to understand. I hope I can also use it for the Weibull distribution to make it comparable with other types of analyses.

Thank you

chjackson commented 1 year ago

weibreg from eha is a proportional hazards Weibull model. flexsurvreg(...,dist="weibull") and the Weibull model survreg are accelerated failure time Weibull models. You can get the proportional hazards model in flexsurv with dist="weibullPH". Though given that the implementation goes via survreg, a similar problem might happen. You could work around this by supplying the estimates you get from eha as initial values to flexsurvreg. But whatever package and model you use, you should check whether the fitted model fits the data well and is scientifically plausible.

shakirarodzlan commented 1 year ago

Thank you very much for your detailed explanation and advice regarding the Weibull models. I'll definitely keep in mind the importance of thoroughly assessing the fitted model's fit to the data and its scientific validity, regardless of the package or model I choose to use. Your guidance has been extremely helpful!