CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
54 stars 13 forks source link

Issue running JointLcmm but not with hlme with same parameters #151

Closed MrGene closed 10 months ago

MrGene commented 1 year ago

Hi,

Thanks for an amazing package. I've been trying to use it for my research, but have been running into a bit of a snag. Specifically with the JointLcmm function as it isn't quite converging and seems to stop working for n>3.

I ran the following code for classes 3-5. m3<-Jointlcmm(fixed = sbp~ age+I(age^2)+sex+race, random =~ 1, mixture = ~ 1 + age+I(age^2), subject = "id", ng = 4, survival = Surv(age_v1,fatchd_17, fatchd_by_17_binary)~sex+race, hazard = "Weibull", data = df) results in summarytable(m3,m4,m5) G loglik npm BIC %class1 %class2 %class3 %class4 %class5 m3 3 -245862.7 23 491945.4 10.78568 73.28394 15.93038
m4 4 -245867.1 29 492011.6 10.58996 74.08780 15.32224 0
m5 5 -247801.8 35 495938.5 0.00000 0.00000 100.00000 0 0

Meanwhile running the hlme on the same data set seems to be working: for k =3-5, m3_hlme<- hlme(fixed = sbp~ age+I(age^2)+sex+race, random =~ 1, mixture = ~ 1 + age+I(age^2), subject = "id", ng = 3, data = df)

summarytable(m3_hlme,m4_hlme,m5_hlme) G loglik npm BIC %class1 %class2 %class3 %class4 %class5 m3_hlme 3 -241928.7 20 484049.3 73.97841 9.956266 16.06533
m4_hlme 4 -241896.5 19 483975.3 69.18136 11.281946 12.51879 7.017904
m5_hlme 5 -241678.2 23 483576.9 63.16797 12.710127 19.19503 0.970343 3.95654

I tried restricting the survival data set to only those who had events during the regular study time instead of during follow-up and got the same result. Is there something that I'm missing or anything else I could try?

Thanks!

VivianePhilipps commented 1 year ago

Hi,

Did you try the gridsearch function? Your models with ng>3 seem to reach a local maximum, so we recommend to do a grid search to get to global maximum. You can find some details in the vignette "Latent class model with hlme". The gridsearch is also available for Jointlcmm models.

Viviane

MrGene commented 1 year ago

Hi, Thanks for your response. I was previously having troubles using the gridsearch function--I tried using it two different ways using 30 reps and 15 iterations, then 50 reps and 20 iteration, but neither was able to get the models to converge for n>3. Some of the iterations run for 0.5 seconds and some run for 3306 seconds. for n 2-5: m5_j_grid<-gridsearch(rep=50, maxiter=20, minit= m1_j, Jointlcmm(fixed = sbp~ age+I(age^2)+visit+sex+race, random =~ 1, mixture = ~ 1 + age+I(age^2), subject = "id", ng = 5, survival = Surv(age_v1,futime_fatchd_17, fatchd_by_17) ~ sex+race, hazard = "Weibull", data = df))

summarytable(m2_j_grid,m3_j_grid,m4_j_grid,m5_j_grid) G loglik npm BIC %class1 %class2 %class3 %class4 m2_j_grid 2 -246579.0 17 493320.6 18.94310 81.05690
m3_j_grid 3 -245862.7 23 491945.4 15.93038 10.78568 73.28394
m4_j_grid 4 -247801.8 29 495881.1 0.00000 0.00000 0.00000 100 m5_j_grid 5 -247801.8 35 495938.5 0.00000 100.00000 0.00000 0 0

I'll go ahead and try more reps and iterations, but is there something else that I could try? Thanks!

Edit: Ran the attempt with more reps and iterations with classes 4 and have now having the same output as n= 2 in terms of %classes.

MrGene commented 1 year ago

Hi again,

I wanted follow-up as I'm still having the same issue despite using the gridSearch function. Do you have any other recommendations or something else I might be able to try?

Thanks!

CecileProust-Lima commented 1 year ago

Hi, indeed, the gridsearch did not manage to find a 4-class or 5-class solution. There may be different reasons for that. It might be that you do not have the information to find more than 3 classes but it may also be that a >4-class solution is very rare or too complicated to find. One alternative solution that we were using before was to "play" with the initial values: fill the B vector with values:

Another solution may be to change slightly the parameterization , for instance, recenter the age, and possibly scale it. But there is no guarantee this will work fine. It depends a lot on the data. I hope you will find a solution with this. Cécile

MrGene commented 1 year ago

Hi,

Thanks for the response, and my apologies for the delay. I've been trying variations of your suggestion. I went back to the manuscript with the examples of JLCMM and saw that you had used a work-around for a similar issue. However, I ran into an issue using the Binit and taking the best from the G-1 model. However, my main question is currently about a problem using the survival function for graphing.

I've been trying to graph the survival function using the best converged model G=3, but the survival graph shows two classes that start at 100% and decrease at different rates (as expected of course) but the third class starts at 20% and shows a gradual decline.

Here is the code for the Jlcmm function--is there something else I can explore to correct the problem with the survival function?

Jointlcmm(fixed = sbp ~ ageScale + I(ageScale^2) + sex + race, mixture = ~1 + ageScale + I(ageScale^2), random = ~1, subject = "id", ng = 3, survival = Surv(age_ex_v1, futime_event_17, event_by_17) ~ sex + race, hazard = "Weibull", data = df)

plot(m3_j_grid, which = "survival", lwd = 2, legend.loc = F, bty = "l", xlab = "age in years", ylab = "Event-free Probability")

One attempt I had to try and remedy this I was attempting one of the above proposed solutions for using G-1 class model with hope that running it differently may correct this. The code for the gridsearch method:

g_3_jlcmm_grid<-gridsearch(rep=30, maxiter=15, minit=m1_j, Jointlcmm(fixed = sbp~ ageScale+I(ageScale^2)+sex+race, random =~ 1, mixture = ~ 1 + ageScale+I(ageScale^2), subject = "id", ng = 3, survival = Surv(age_v1,futime_event_17, event_by_17) ~ sex+race, hazard = "Weibull", data = df)) G loglik npm BIC %class1 %class2 %class3 g2_jlcmm_grid 2 -245798.5 17 491756.2 93.968689 6.031311
g3_jlcmm_grid 3 -243528.5 23 487272.4 3.379245 92.839422 3.781332877 g4_jlcmm_grid 4 -251162.1 29 502595.8 8.272735 91.017196 0.008555052 I used the code from the manuscript to use the best values and add other random values for the 6 new parameters.

Binit <- rep(0, length(g_2_jlcmm_grid$best) + 6) Binit[c(2, 5:10, 12, 13, 15, 16, 18, 19:(length(Binit)))] <- g_2_jlcmm_grid$best Binit[c(1, 3, 4, 11, 14, 17)] <- c(0, 0.11, 4, 70, 0, 0) g3_jlcmm_best<- Jointlcmm(fixed = sbp~ ageScale+I(ageScale^2)+sex+race, random =~ 1, mixture = ~ 1 + ageScale+I(ageScale^2), subject = "id", ng = 3, survival = Surv(age_v1,futime_event_17, event_by_17) ~ sex+race, hazard = "Weibull", data = df, B = Binit)

It seems to stick with the maxima from the previously found solution. G loglik npm BIC %class1 %class2 %class3 %class4 g3_jlcmm_best 3 -245798.5 23 491812.4 0 93.96869 6.031311
g4_jlcmm_best 4 -245798.5 29 491868.6 0 93.96869 0.000000 6.031311

I tried scaling the age, but was unsuccessful. Please let me know if you have any suggestions/troubleshooting you could recommend.

Thanks!

MrGene commented 1 year ago
Screen Shot 2023-01-30 at 8 47 30 PM

Here is an example of the problem with the graph. The graph was working successfully for some other traits G=2

VivianePhilipps commented 1 year ago

Hi,

As the timescale in your survival model is age, the Weibull distribution is defined from age 0. So there is no constrain on the survival probability at the minimum entry time in your data (about 45 years here). Your green class has surely a bad survival, and the fact that it starts at 20% may be an artifact because of a lack of flexibility. More flexible hazard risk functions will imply more parameters to estimate, but if possible regarding your data you can try splines risk functions,.

Viviane