CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
48 stars 13 forks source link

Questions regarding estimation of link functions and latent processes within lcmm package #260

Open timothyjw1986 opened 1 week ago

timothyjw1986 commented 1 week ago

I have read with interest your 2017 paper ‘Estimation of Extended Mixed Models Using Latent Classes and Latent Processes: The R Package lcmm’ (https://doi.org/10.48550/arXiv.1503.00890), and would like to ask some questions regarding the modelling approach described in your paper.

  1. On page 11, the authors describe that the lcmm function can be used to fit ‘Univariate latent process mixed models’. My understanding of latent variables (and therefore presumably, latent processes?) is that these cannot be measured directly, but are typically inferred using a combination of direct measures (e.g., in the case of the latent construct ‘cognition’, multiple cognitive tests can be used). How can the lcmm function model a latent process using a univariate approach?
  2. The lcmm package includes various transformations to address suboptimal metrological properties of psychometric tests (i.e., linear, splines, beta, threshold) – when computing the transformation, does the package take into account all available data for a given measure (i.e., including repeated longitudinal measures where available), or only initial scores?

Many thanks for your help.

Best wishes, Tim

VivianePhilipps commented 1 week ago

Hi Tim,

you are right, a latent process is a quantity that is not directly measurable. If you use multiple outcomes, the latent process is what is common to all these outcomes. If you use only one outcome, the interpretation of the latent process is even easier. You gave the example of cognition; if you use multiple cognitive tests, each test measuring one cognitive domain (verbal fluency, attention, executive functions, memory, etc), the latent process is cognition. If you use only one verbal fluency test, the latent process cannot be interpreted as cognition, it is the verbal fluency.

However, the latent process and the outcome differ in univariate models because the outcome is a noisy (and imperfect) measure of the latent process. In addition you have the link function between them. This link function is estimated using all available data, not only baseline measures.

Best,

Viviane

timothyjw1986 commented 1 week ago

Hi Viviane,

Thank you so much for your answer. I hope it's OK if I ask some related questions. Under section ‘7. Concluding remarks’ of your 2017 paper ‘Estimation of Extended Mixed Models Using Latent Classes and Latent Processes: The R Package lcmm’, the co-authors write ‘the latent process mixed model is designed for the longitudinal analysis of scales that usually have asymmetric distributions with possibly a ceiling effect, floor effects and unequal interval scaling’. I understand that one of the aims of the lcmm package is to ‘correct’ these poor metrological properties. My main question is whether you might be able to comment on how successful or valid the transformations available within the lcmm package are at correcting these properties? The functions and intended application of the lcmm package certainly appeal to me, but I would like to better understand the underpinning rationale.

Perhaps I can clarify my questions with reference to two papers by Proust-Lima and colleagues. In the first paper (https://doi.org/10.1093/aje/kwk017), the authors constructed a ‘common cognitive factor’ using MMSE, BVRT, DSST, and IST15, and show graphs (Fig 2.) of the flexible nonlinear transformations used to ‘link’ scores on each measure to the common factor. In a subsequent paper (https://doi.org/10.1093/aje/kwr243), the same authors undertake a similar analysis, except the cognitive measures (in this case, MMSE, CALC MMSE subscore, IST, and BVRT) are linked to separate/individual latent processes (again using flexible nonlinear transformations; see Fig. 3). If I visually compare the shapes of the transformations between the two papers, these (reassuringly) appear comparable for the measures which appear in both. However, I struggle to understand how the statistical model has ‘correctly’ estimated the nonlinear link function in the later paper, given the only information used in the model is the sum score for each separate test.

What I mean is, how can the model identify the transformation to map (e.g.) MMSE sum score on to an underlying cognitive construct, when it only has access to the MMSE sum score? Intuitively, I can better make sense of the example where a common cognitive factor is created, as the additional tests used to calculate this provide additional information regarding cognitive function, which can be used to estimate the function to link MMSE score to the underlying construct. Can you please try to help me understand how the model works in the ‘univariate’ case (perhaps focusing on the beta CDF link function)? Ultimately, my question is motivated by a desire to understand how ‘valid’ flexible nonlinear transformations are for ‘correcting’ suboptimal metrological properties of sum scores for cognitive measures, when the analyst has sum scores from only a single measure. Could you perhaps suggest methods for ‘validating’ the ‘success’ of the transformations provided within lcmm? I would like to use the beta CDF transformations provided within the package, but I would like to get some of the above points clear in my mind before proceeding!

Thank you very much, Tim