CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
55 stars 13 forks source link

Testing class membership predictors using a 3-step approach possible in lcmm? #38

Closed JsabelHodel closed 3 years ago

JsabelHodel commented 4 years ago

Dear lcmm team,

I would like to perform GMM on longitudinal data with up to 4 repeated unstructured measurement time points of an outcome measuring people’s performance in activities of daily living. Once I have found the number of different classes within the data, I will be interested in the predictors of the class membership. To test them, I would like to use a 3-step approach in order to account for the classification errors in the class allocations (https://www.researchgate.net/publication/228389117_Latent_Class_Modeling_with_Covariates_Two_Improved_Three-Step_Approaches & https://www.statmodel.com/examples/webnotes/AuxMixture_submitted_corrected_webnote.pdf).

According to the paper of Asparouhov and Muthén, the 3 steps to go are (Chapter 2):

  1. Do a Latent Class Analysis with only using the latent class indicator. -> As a simple first example I used a hlme model with three classes:

lin1 <- hlme(SCIM_raw_num ~ days_SCIM_adm, random=~ days_SCIM_adm, mixture=~ days_SCIM_adm, ng = 3, B =lin0, subject='id', data=data_long)

  1. Create the most likely class variable for each individual (N), its classification error (p) and its uncertainty rates (q; and logits k). -> I was able to extract the most likely class variable and to calculate the corresponding uncertainty rates using the formulas presented in the mentioned paper:

Most likely class membership

data_pred <- data_long data_pred$id <- as.character(data_pred$id)

data_N <- as.data.frame(lin1$pprob[,1:2]) data_N$id <- as.character(data_N$id) data_pred$N <- factor( data_N$class[sapply(data_pred$id, function(x) which(data_N$id==x))] )

Number of persons per class

N1 <- 92 N2 <- 270 N3 <- 485

Posterior classification table = classification errors:

p11 <- 0.7711 p12 <- 0.0694 p13 <- 0.1596 p21 <- 0.0572 p22 <- 0.8986 p23 <- 0.0442 p31 <- 0.0416 p32 <- 0.0261 p33 <- 0.9323

uncertainty rates

q11 <- p11N1/(p11N1+p21N2+p31N3) q12 <- p21N2/(p11N1+p21N2+p31N3) q13 <- p31N3/(p11N1+p21N2+p31N3) q21 <- p12N1/(p12N1+p22N2+p32N3) q22 <- p22N2/(p12N1+p22N2+p32N3) q23 <- p32N3/(p12N1+p22N2+p32N3) q31 <- p13N1/(p13N1+p23N2+p33N3) q32 <- p23N2/(p13N1+p23N2+p33N3) q33 <- p33N3/(p13N1+p23N2+p33N3)

logits of uncertainty rates = needed for Step 3

k11 <- log(q11/q13) k12 <- log(q12/q13) k13 <- log(q13/q13) k21 <- log(q21/q23) k22 <- log(q22/q23) k23 <- log(q23/q23) k31 <- log(q31/q33) k32 <- log(q32/q33) k33 <- log(q33/q33)

  1. Use the most likely class variable as latent class indicator variable (its uncertainty rates should be prefixed by the obtained measurement errors k in step 2). In this step also the predictors of the class membership should be included in the model. -> For this step I would be very happy about your support how to do it in the lcmm package. Do you have any suggestions? And does the above 3-steps approach also work for models fitted with the lcmm() function and non-linear link functions?

Thank you very much for your support! Best regards, Jsabel

CecileProust-Lima commented 3 years ago

Dear Jsabel, I am really sorry I did not answer earlier, and I hope you found the solution to your problem. We did not implement the 3-step methodology yet in the package so the 3-step procedure has to be coded by the user for the moment.
Cécile

jbr1483 commented 7 months ago

Hi Jsabel,

I'm trying to perform a three-step bias adjusted approach in R and wondered whether you'd be willing to share your code - particularly for working out the classification error?

CecileProust-Lima commented 7 months ago

Hi, in the meantime, we have implemented techniques to account for the uncertainty. Please find information in the vignette: https://cecileproust-lima.github.io/lcmm/articles/secondary_model_with_externVar.html Cécile

JsabelHodel commented 7 months ago

Hi Cécile, Thanks for your reply. I recently noticed the updates of the lcmm package and I am very much looking forward to have a look at it and maybe use it for a next project. Thank you for your work and the implementation! Best, Jsabel

jbr1483 commented 6 months ago

Thanks Cecile, Really helpful.

I had a [potentially very simple] question that probably shows my confusion.

I have a dataset on obesity prevalence containing a population prevalence (Value) over time (jrtime - factor....1 to 12) for 150 areas (identified as ID).

I have tried to implement the below as a GMM based upon the vignette;

Original model:

gmm2 <- gridsearch(rep = 1500, maxiter = 50, minit = gmm1, hlme(Value ~ jrtime, subject = "ID", random=~jrtime, ng = 2, data = rec, mixture = ~ jrtime, nwg=T))

I have investigated associations with potential predictor variables [appended to the aggregated data] using the simple unadjusted logistic regression based upon the posterior probability of being assigned to Class 2 (based upon gmm2$pprob (prob2)).

However, I would like to conduct a bias-adjusted analysis.

I attempted to implement as below:

biasadjusted <- externVar(gmm2, classmb = ~ IMD, subject = "ID", data = rec2, method = "conditional") Class2stage <- summary(biasadjusted)

However, my outcome shows me the association with Class 1. I wondered whether there was a way to flip the classes and also whether the above would be correct where looking to bias-adjust based upon a single variable? I also wondered whether the vignette method implements as a 3 stage adjustment directly in accordance with Vermunt based upon classification error.

I had begun to try to manually develop a version, in which the below was the method for deriving classification error:test <- as.data.frame(gmm2$pprob) row.names(test) <- test$ID test <- test[,c(2:4)]

probs <- as.data.table(test) outcomeRec$W <-modclass <- apply(probs[,c(2,3)],1,which.max)

nclass = 2 Ptable <- cbind(probs[,c(2,3)],modclass) Pmatrix <- matrix(0,nclass,nclass) Npmatrix <- matrix(0,nclass,nclass)

for (i in 1:nclass){ for (j in 1:nclass){ Pmatrix[i,j] <-sum(subset(Ptable,modclass==i)[,..j]) Npmatrix[i,j] <-Pmatrix[i,j]*table(modclass)[i] }}

denom<-colSums(Npmatrix) Qmatrix<-matrix(0,nclass,nclass)

for (i in 1:nclass){ for (j in 1:nclass){ Qmatrix[j,i]<-Npmatrix[i,j]/denom[j] }}

lQ<-log(Qmatrix/Qmatrix[,1]) lQ

rec$lq<-c(as.vector(t(lQ[,-1])),rep(0,(n=1)))

Any help hugely appreciated. Thanks again for all your work developing the package. It has been highly successful and already contributed to a strong piece of research.

Regards John

CecileProust-Lima commented 6 months ago

Hi, we have a permut function to flip the classes. You could use this before using externVar application. It should work fine. Thank you for your positive feedback, always a pleasure to read that this is useful!!! Cécile