CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
54 stars 13 forks source link

Predicted trajectories not appearing when changing knot placement (lcmm, link=splines) #203

Closed Tina-Kb closed 11 months ago

Tina-Kb commented 1 year ago

Hi,

I'm having issues with the predictY function with a lcmm model with link = splines. Some parts of the mean predicted trajectories are not showing on the plot. I was able to plot the same model with knots positioned differently (equidistant) without any problems. Why is the positioning of the knots affecting the predictY function? How can I fix this issue if I want to keep the same number/position for the knots?

Here's the R code for the model not showing on the plot :

mvpa_f2 = gridsearch(rep = 100, maxiter = 15, minit = mvpa_f1, m=lcmm(MVPA_TOTAL ~ poly(temps, degree = 3, raw = TRUE), subject = "ID", ng = 2, data = subset(traj_long, sex == 2), mixture = ~ poly(temps, degree = 3, raw = TRUE), link = "5-manual-splines", intnodes = c(1, 120, 300))) summary(mvpa_f2)

General latent class mixed model fitted by maximum likelihood method

lcmm(fixed = MVPA_TOTAL ~ poly(temps, degree = 3, raw = TRUE), mixture = ~poly(temps, degree = 3, raw = TRUE), subject = "ID", ng = 2, link = "5-manual-splines", intnodes = c(1, 120, 300), data = subset(traj_long, sex == 2))

Statistical Model: Dataset: subset(traj_long, sex == 2) Number of subjects: 417 Number of observations: 1925 Number of observations deleted: 160 Number of latent classes: 2 Number of parameters: 15
Link function: Quadratic I-splines with nodes
0 1 120 300 2520

Iteration process: Convergence criteria satisfied Number of iterations: 1 Convergence criteria: parameters= 1.1e-12 : likelihood= 1e-09 : second derivatives= 2.6e-10

Goodness-of-fit statistics: maximum log-likelihood: -9906.37
AIC: 19842.74
BIC: 19903.24

Maximum Likelihood Estimates:

Fixed effects in the class-membership model: (the class of reference is the last class)

                 coef      Se    Wald p-value

intercept class1 -0.49919 0.31565 -1.581 0.11377

Fixed effects in the longitudinal model:

                                 coef      Se    Wald p-value

intercept class1 (not estimated) 0
intercept class2 -0.56792 0.14205 -3.998 0.00006 poly(...)1 class1 0.09498 0.11816 0.804 0.42151 poly(...)1 class2 -0.46061 0.07805 -5.902 0.00000 poly(...)2 class1 -0.05040 0.03531 -1.427 0.15347 poly(...)2 class2 0.00466 0.02516 0.185 0.85313 poly(...)3 class1 0.01003 0.03078 0.326 0.74452 poly(...)3 class2 0.05490 0.02095 2.620 0.00879

Residual standard error (not estimated) = 1

Parameters of the link function:

           coef      Se    Wald p-value

I-splines1 -1.80897 0.11421 -15.838 0.00000 I-splines2 0.93933 0.01769 53.110 0.00000 I-splines3 0.22670 0.05276 4.297 0.00002 I-splines4 0.94294 0.02383 39.572 0.00000 I-splines5 1.48450 0.02824 52.559 0.00000 I-splines6 0.00003 0.02068 0.002 0.99874 I-splines7 1.01290 0.12296 8.238 0.00000

predY = predictY(mvpa_f2, newdata = subset(traj_long, sex == 2), var.time="age") plot(predY, shades=TRUE, ylab= "MVPA (minutes per week)", xaxt = "n", xlab= "Age (years)" ) axis(side = 1, at=c(20.3, 24.1, 30.3, 33.6, 35.3))

Screen Shot 2023-07-31 at 11 56 01 AM

Please note that the knots for mvpa_f2 were chosen according to the BIC, after comparing various combinations of knots. 4 out of 5 knots in the mvpa_f2 model correspond to the quantiles of MVPA_TOTAL :

quantile(traj_long$MVPA_TOTAL[!is.na(traj_long$MVPA_TOTAL)]) 0% 25% 50% 75% 100% 0 0 120 300 2520

Here's the R code for the same model but with equidistant knots. The predictions fully appear on the plot :

test = gridsearch(rep = 100, maxiter = 15, minit = mvpa_f1, m=lcmm(MVPA_TOTAL ~ poly(temps, degree = 3, raw = TRUE), subject = "ID", ng = 2, data = subset(traj_long, sex == 2), mixture = ~ poly(temps, degree = 3, raw = TRUE), link = "splines"))

summary(test) General latent class mixed model fitted by maximum likelihood method

lcmm(fixed = MVPA_TOTAL ~ poly(temps, degree = 3, raw = TRUE), mixture = ~poly(temps, degree = 3, raw = TRUE), subject = "ID", ng = 2, link = "splines", data = subset(traj_long, sex == 2))

Statistical Model: Dataset: subset(traj_long, sex == 2) Number of subjects: 417 Number of observations: 1925 Number of observations deleted: 160 Number of latent classes: 2 Number of parameters: 15
Link function: Quadratic I-splines with nodes
0 630 1260 1890 2520

Iteration process: Convergence criteria satisfied Number of iterations: 2 Convergence criteria: parameters= 1e-10 : likelihood= 1.4e-07 : second derivatives= 1.8e-08

Goodness-of-fit statistics: maximum log-likelihood: -12614.29
AIC: 25258.58
BIC: 25319.08

Maximum Likelihood Estimates:

Fixed effects in the class-membership model: (the class of reference is the last class)

                 coef      Se    Wald p-value

intercept class1 -1.30191 0.24759 -5.258 0.00000

Fixed effects in the longitudinal model:

                                 coef      Se    Wald p-value

intercept class1 (not estimated) 0
intercept class2 -0.66869 0.16334 -4.094 0.00004 poly(...)1 class1 0.35794 0.14578 2.455 0.01407 poly(...)1 class2 -0.42772 0.06387 -6.697 0.00000 poly(...)2 class1 -0.03887 0.04452 -0.873 0.38263 poly(...)2 class2 -0.01464 0.01954 -0.749 0.45384 poly(...)3 class1 -0.04133 0.03983 -1.038 0.29932 poly(...)3 class2 0.05956 0.01689 3.526 0.00042

Residual standard error (not estimated) = 1

Parameters of the link function:

           coef      Se    Wald p-value

I-splines1 -1.83343 0.14700 -12.473 0.00000 I-splines2 1.61233 0.01863 86.547 0.00000 I-splines3 0.10914 0.40113 0.272 0.78556 I-splines4 1.11592 0.05515 20.234 0.00000 I-splines5 0.00174 0.03716 0.047 0.96275 I-splines6 1.18864 0.12579 9.450 0.00000 I-splines7 0.56425 0.26900 2.098 0.03594

predY = predictY(test, newdata = subset(traj_long, sex == 2), var.time="age") plot(predY, shades=TRUE, ylim=c(100, 600), ylab= "MVPA (minutes per week)", xaxt = "n", xlab= "Age (years)" ) axis(side = 1, at=c(20.3, 24.1, 30.3, 33.6, 35.3))

Screen Shot 2023-07-31 at 11 58 25 AM

Thanks!

Tina

Tina-Kb commented 1 year ago

Also, I would like to know if, in order to make valid comparaisons, the number and placement of knots need to be the same between the models compared? For ex., if I want to compare models containing between 2 and 5 classes, is it important that the link = splines argument be specified exactly the same way (same number and placement of knots)?

Thanks!

Tina

VivianePhilipps commented 1 year ago

Hi,

I think the predictY function had computational issues with your model. If so, you should see NA (or NaN, Inf) in predY$pred. This could be due the node location, even though the model converged. Indeed, you put the first interior node at 1 and the minimum is 0, so maybe there is not enough information between 0 and 1.

And if compare models with different number of latent classes, yes the structure should be exactly the same for all models. There shouldn't be any differences between the models, except the number of classes.

Best,

Viviane