CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
50 stars 13 forks source link

Questions regarding normalisation #172

Closed robcolaes closed 1 year ago

robcolaes commented 1 year ago

Dear Cécile and dear Viviane,

Thank you for developing this R-package.

I'm a Ph.D. student at the Catholic University of Leuven. For my research, I'm looking to apply a latent class mixed model to my data. My data consists of 113 subjects who were tested for cognition and with medical imaging at three different time points (T0, T1, T2). I would like to make different classes based on their CFQ (Cognitive Failure Questionnaire) outcome over the three-time points. In a later stage, I would like to compare the medical imaging results between the different classes.

I have a few questions regarding normalization and model convergence.

  1. If I look at the histogram of CFQ, there is slight left skew. Is this a problem for applying "hlme" or should I use "lcmm" with a link function? However, after fitting the model with "hlme", the residuals seem to be normally distributed. For the normality assumption of the outcome variable, what is important to pay attention to?

  2. For the grid search, model convergence is already reached after only 1 iteration. What could be the explanation for this?

Kind regards,

Rob Colaes

Histogram of CFQ outcome before model and residuals of the best model (3 classes)

image image

My code:

t1 <- hlme(CFQtot~time, random = ~time, subject='ID_num', data = explo)

t2 <- gridsearch(hlme(CFQtot~time, random = ~time, subject='ID_num', data = explo, ng = 2, mixture = ~time, nwg=TRUE), rep=100, maxiter = 30, minit = t1)

t3 <- gridsearch(hlme(CFQtot~time, random = ~time, subject='ID_num', data = explo, ng = 3, mixture = ~time, nwg=TRUE), rep=100, maxiter = 30, minit = t1)

t4 <- gridsearch(hlme(CFQtot~time, random = ~time, subject='ID_num', data = explo, ng = 4, mixture = ~time, nwg=TRUE), rep=100, maxiter = 30, minit = t1)

Summary of the best model (3 classes)

Heterogenous linear mixed model fitted by maximum likelihood method

hlme(fixed = CFQtot ~ time, mixture = ~time, random = ~time, subject = "ID_num", ng = 3, nwg = TRUE, data = explo)

Statistical Model: Dataset: explo Number of subjects: 113 Number of observations: 336 Number of observations deleted: 1 Number of latent classes: 3 Number of parameters: 14

Iteration process: Convergence criteria satisfied Number of iterations: 1 Convergence criteria: parameters= 2.5e-05 : likelihood= 1.1e-06 : second derivatives= 6.2e-09

Goodness-of-fit statistics: maximum log-likelihood: -1231.48
AIC: 2490.96
BIC: 2529.15

Maximum Likelihood Estimates:

Fixed effects in the class-membership model: (the class of reference is the last class)

                  coef      Se    Wald p-value

intercept class1 -1.04093 0.65349 -1.593 0.11119 intercept class2 2.20618 0.42050 5.247 0.00000

Fixed effects in the longitudinal model:

                  coef      Se    Wald p-value

intercept class1 64.24929 2.86584 22.419 0.00000 intercept class2 28.38738 1.05403 26.932 0.00000 intercept class3 32.11377 4.96748 6.465 0.00000 time class1 -11.91760 2.19853 -5.421 0.00000 time class2 0.19053 0.47216 0.404 0.68656 time class3 12.74240 1.85386 6.873 0.00000

Variance-covariance matrix of the random-effects: intercept time intercept 150.28154
time 9.16687 0.55916

                         coef      Se

Proportional coefficient class1 0.00006 0.16245 Proportional coefficient class2 0.68098 0.19779 Residual standard error: 5.90848 0.30283

CecileProust-Lima commented 1 year ago

Hi, for your first question: your score does not seem to have a strong asymmetry so probably hlme is enough. You could however test that using lcmm and compare the AIC obtained with a linear link and a splines link if you want to be sure. For the grid search, we run the procedure 100 times for 30 iterations and then rerun a final procedure from the maximum likelihood obtained across the 100 trials. If you had properly converged during the trials in less than 30 iterations, then the final procedure may run for only one iteration as already at the maximum. Hope this helps ... Best Cécile

robcolaes commented 1 year ago

Yes, that helps me a lot! Thank you for answering so quickly and for making this well-documented and easy-to-use R-package.

Kind regards, Rob