CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
48 stars 13 forks source link

Log-likelihood of hlme model #235

Closed prokashdeb01 closed 4 months ago

prokashdeb01 commented 5 months ago

Greetings,

I'm a PhD student and trying to compute the log-likelihood estimate with 2-class model. This is because I want to combine this model with another model and find the combined maximum likelihood estimate. The following is the code I'm using for the latent class:

m.1 <- hlme(fixed = quantity~mon+I(mon^2)+price_final+e, subject = "household_code", ng=1, maxiter = 10000000, data = pbm_1.1, nproc = 14)

m.2 <- hlme(fixed = quantity~mon+I(mon^2)+price_final+e, mixture = ~mon+I(mon^2)+price_final+e, classmb =~Household_Income+Household_Size+male_under_30+male_30_to_45+female_under_30+female_30_to_45+children_yes+male_less_highschool+female_less_highschool+married+african_american+asian+other+hispanic_yes, subject = "household_code", ng=2, B=m.1, maxiter = 10000000, nwg = F, data = pbm_1.1, nproc = 14)

initial_guess_lcmm <- estimates(m.2, cholesky = T)

Log-likelihood for the hlme model

log_likelihood_lcmm <- function(params_lcmm, data_lcmm) {

Ensure params_lcmm is numeric

params_lcmm <- as.numeric(params_lcmm)

# Y0: Numeric vector of outcomes
Y0 <- as.numeric(data_lcmm$quantity)

# X0: Numeric matrix of covariates
X0 <- model.matrix(~price_final + e + Household_Income + Household_Size +
                   male_under_30 + male_30_to_45 + female_under_30 + female_30_to_45 +
                   children_yes + male_less_highschool + female_less_highschool +
                   married + african_american + asian + other + hispanic_yes-1,
                   data = data_lcmm)

# Other necessary parameters for loglikhlme
nmes0 <- as.vector(table(data_lcmm$household_code)) #number of mesures for each subject (length ns0 or dom ns0*ny0)
ns0 <- length(nmes0) # Number of subjects (24 row per subject)
ng0 <- 2  # Number of latent classes
nom.X0 <- colnames(X0)
nvar.exp <- length(nom.X0)
nv0 <- nvar.exp #number of covariates
prior2 <- as.integer(rep(0,ns0))
prior0 <- prior2 #the prior latent class (length ns0)
pprior0 <- matrix(1, 69, ns0) #the prior probabilty of each latent class; 1081 is the number of unique household
idprob0 <- rep(0, nvar.exp) #indicator of presence in the class membership submodel (length nv0)
idea0 <- rep(0, nvar.exp) #indicator of presence in the random part of the longitudinal submodel (length nv0)
idg0 <- rep(0, nvar.exp) #indicator of presence in the fixed part of the longitudinal submodel (length nv0)
z.X0 <- strsplit(colnames(X0),split=":",fixed=TRUE)
z.X0 <- lapply(z.X0,sort)
idcor0 <- rep(0, length(z.X0)) #indicator of presence in the correlation part of the longitudinal submodel (length nv0)
nobs0 <- length(Y0)     # Number of observations
nea0 <- sum(idea0==1)               # Number of random effects
idiag0 <- as.integer(0) #indicator of diagonal variance matrix of the random effects
nwg0 <- as.integer(0) #number of parameters for proportional random effects over latent classes
ncor0 <- 0 #number of parameters for the correlation
npm0 <- length(m.2[["best"]])  # Total number of parameters
fix0 <- rep(0,npm0) #indicator of non estimated parameter (length npm0+nfix0)
nfix0 <- sum(fix0) # number of non estimated parameters
bfix0 <- 0 #vector of non estimated parameters

# res <- 0
# ppi0 <- rep(0,ns0*ng0)
# resid_m <- rep(0,nobs0)
# resid_ss <- rep(0,nobs0)
# pred_m_g <- rep(0,nobs0*ng0)
# pred_ss_g <- rep(0,nobs0*ng0)
# predRE <- rep(0,ns0*nea0)
# varRE <- rep(0,ns0*nea0*(nea0+1)/2)
# estim0 <- 1

# Call the Fortran function to calculate log-likelihood
log_likelihood <- loglikhlme(b = params_lcmm, Y0 = Y0, X0 = X0, ns0 = ns0, ng0 = ng0, 
                             nobs0 = nobs0, nea0 = nea0, npm0 = npm0, 
                             prior0=prior0, pprior0=pprior0, 
                             idprob0=idprob0, idea0=idea0, idg0=idg0, idcor0, nv0=nv0,nmes0=nmes0, 
                             idiag0=idiag0, nwg0=nwg0, ncor0=ncor0, fix0=fix0,
                             nfix0=nfix0, bfix0=bfix0)

return(log_likelihood)

}

result <- maxLik(logLik = log_likelihood_lcmm, start = initial_guess_lcmm, method = "BFGS", control = list(tol = 1e-6), data = pbm_1.1)

summary(result) Screenshot 2024-01-12 172622

The maxLik fucntion is providing a similar log likelihood value (result[["maximum"]]: -3516.639 and m.2[["loglik"]]: -3117.81). The maxLik function Return code 0: Successful convergence. However, the standard error for all parameters are Inf. It's likely that my written log_likelihood_lcmm function has some issues which I'm not able to figure out. Is it possible to let me know what is the best way to write the log_likelihood_lcmm function in this case? I'm grateful for your help and thanks a lot in advance!

VivianePhilipps commented 5 months ago

Hi,

the issue lies in the definition of the idprob0, idea0, and idg0 vectors. If you set them to 0, it means that none of the nvar.exp covariates are used in the model. So the corresponding parameters are in reality not used in the likelihood, that's why you get infinite standard errors.

The idprob0 vector should contain 0s and 1s. 1 in position k if the covariate number k (the k-th column of X0) appears in the classmb formula. 0 otherwise.

For idea0 : 1 in position k if the covariate number k appears in the random formula. 0 otherwise.

For idg0 : 2 in position k if the covariate number k appears in the fixed and in the mixture formula.
1 in position k if the covariate number k appears only in the fixed formula.
0 otherwise.

Viviane

prokashdeb01 commented 5 months ago

Hi Dr. Viviane Philipps. Thanks a lot for pointing out the issue in my code. I'm grateful. The Log-likelihood function is working now.

Proaksh Deb