CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
54 stars 13 forks source link

Class-specific trajectory out of scope of original variables in multlcmm #227

Closed MrGene closed 2 months ago

MrGene commented 10 months ago

Hi, thanks for your package--I'm working on trying to incorporate two related variables using the multlcmm package. The problem I've been running into (after finally getting it to converge) is that the predicted predicted trajectory is out of the scope of the original variables.

m1<- multlcmm(fixed = x +y ~ age_sd + sex+race, random =~ 1, subject = "id", ng = 1, data = df)

m3<-gridsearch(rep=50,maxiter=20,minit=m1, multlcmm(fixed = x + y ~ poly(age_sd, degree = 2) + sex + race + x_pgs + y_pgs, B=m1, mixture = ~1 + poly(age_sd, degree = 2), random = ~1, subject = "id", ng = 3, link = "linear", data = df)

where age_sd is normalized age and sex and race are 1 or 2 respectively

summarytable(m3) G loglik npm BIC %class1 %class2 %class3 m3 3 -334192.7 20 668572.8 3.009181 90.62394 6.366882

summary(m3) General latent class mixed model fitted by maximum likelihood method

multlcmm(fixed = x + y ~ poly(age_sd, degree = 2) + sex + race, mixture = ~1 + poly(age_sd, degree = 2), random = ~1, subject = "id", ng = 3, link = "linear", data = aric_long)

Statistical Model: Dataset: df Number of subjects: 11832 Number of observations: 86968 Number of latent classes: 3 Number of parameters: 18
Link functions: Linear for x
Linear for y

Iteration process: Convergence criteria satisfied Number of iterations: 27 Convergence criteria: parameters= 3.6e-06 : likelihood= 4e-08 : second derivatives= 2.7e-12

Goodness-of-fit statistics: maximum log-likelihood: -336289.24
AIC: 672614.48
BIC: 672747.29

Maximum Likelihood Estimates:

Fixed effects in the class-membership model: (the class of reference is the last class)

                   coef       Se    Wald p-value

intercept class1 2.24780 0.10061 22.341 0.00000 intercept class2 -0.53981 0.14099 -3.829 0.00013

Fixed effects in the longitudinal model:

                                   coef       Se    Wald p-value

intercept class1 (not estimated) 0.00000
intercept class2 1.37758 0.09996 13.781 0.00000 intercept class3 1.44271 0.06299 22.905 0.00000 poly(...)1 class1 84.43846 2.34361 36.029 0.00000 poly(...)1 class2 150.19059 23.83529 6.301 0.00000 poly(...)1 class3 38.84740 13.31102 2.918 0.00352 poly(...)2 class1 6.29278 1.83939 3.421 0.00062 poly(...)2 class2 284.74295 15.77849 18.046 0.00000 poly(...)2 class3 -206.73143 9.58984 -21.557 0.00000 sex -0.19508 0.02119 -9.205 0.00000 race 0.87786 0.02634 33.328 0.00000

Variance-covariance matrix of the random-effects: (the variance of the first random effect is not estimated) intercept intercept 1

                            x        y

Residual standard error: 0.99172 0.76401

Parameters of the link functions:

               coef       Se    Wald p-value

x-Linear 1 68.34166 0.35417 192.963 0.00000 x-Linear 2 7.77891 0.09202 84.537 0.00000 y-Linear 1 114.03548 0.69867 163.217 0.00000 y-Linear 2 15.36849 0.17122 89.757 0.00000

data_pred0 <- data.frame(Age_ex = seq(45,80,by=0.1), sexnum=0,racenum=0, x_pgs=0, y_pgs=0) data_pred0$age_sd<-(data_pred0$Age_ex-65)/10

pred0<-predictY(m3,data_pred0,var.time="Age_ex",draws=T) head(pred0) The normal range of both variables is 50-200, but the pred0 gives values completely out of the normal range. For class 1: y starts at -19 and increases to +275 which is completely out of the scale of the actual variable. x starts at 0 and increases to 150, which is also out of scope of the variable. Class 2 - Parabola starts at +500, minimum of -200, increases to +500 Class 3: Parabola starting at -200, maximum of 400, returns to -200.

plot(pred0, col=c("blue","green","purple"), lty=1, lwd=2, ylab="y", xlab = 'Age', main="Class-Mean Predicted Y Trajectories", legend=NULL, ylim=c(-200,500), shades=TRUE, outcome=2)

I'm exploring other link functions than linear, though I don't think that would solve this problem. I'm also using hlme to for two separate models one for x one for y, but it is more concise to have them together. Do you have any recommendations of what else I could try/what might be going wrong?

Thanks!

VivianePhilipps commented 10 months ago

Hi,

there are many reasons for that. Maybe you are extrapolating too much the curves. You plot the trajectories from 45 to 80. It is important to check that in each latent class, you have enough information in this range of time. Maybe you should reduce the first trajectory to the 2.5 and 97.5 quantiles of the observations times of the subjects classified in class 1 (and the same for classes 2 and 3). Another reason is the fit of the model. Maybe a quadratic trajectory is not flexible enough, or the random effects are not well specified, or the link function, etc. A plot of the observations along with the estimated trajectories may be useful.

Viviane