DenisRustand commented 2 years ago

Hello, I am trying to compare the available packages for joint models with multivariate longitudinal outcome and a terminal event through simulation studies. I have surprising results with JMbayes2, the coverage probabilities for the fixed effects of the longitudinal submodels are very close to 1 instead of 0.95 (based on quantiles at 2.5% and 97.5% given in the summary of jm() output), am I doing something wrong? Please find attached a code that illustrates the issue in a simple case (one marker with random intercept and slope and a terminal event) but I have similar results with multivariate longitudinal data. Thanks, Denis Rustand

library(joineR) # for data simulation
library(JMbayes2)
nmod=50 # number of models
nsub=100 # number of individuals

# store coverage of true value (1) of each model parameter in "res_list" to compute coverage probabilities
res_list <- replicate(nmod, rep(NA, 4), simplify = FALSE)
for(i in 1:nmod){
# association between longi and surv is set to zero
data <- simjoint(nsub, model="intslope", gamma=0)

  CoxFit <- coxph(Surv(survtime, cens) ~ ctsx + binx, data = data$survival)
  fm1 <- lme(Y ~ time + ctsxl + binxl, random = ~ time | id, data = data$longitudinal)
  JMB <- jm(CoxFit, fm1, time_var = "time")
  JMB2 <- summary(JMB)

  # 1 if true value is within credible interval, 0 otherwise
  res_list[[i]][1] <- ifelse(1>=(JMB2$Outcome1["(Intercept)", "2.5%"]) & 1<=(JMB2$Outcome1["(Intercept)", "97.5%"]), 1, 0) # intercept
  res_list[[i]][2] <- ifelse(1>=(JMB2$Outcome1["time", "2.5%"]) & 1<=(JMB2$Outcome1["time", "97.5%"]), 1, 0) # slope
  res_list[[i]][3] <- ifelse(1>=(JMB2$Outcome1["binxl", "2.5%"]) & 1<=(JMB2$Outcome1["binxl", "97.5%"]), 1, 0) # continuous covariate
  res_list[[i]][4] <- ifelse(1>=(JMB2$Outcome1["ctsxl", "2.5%"]) & 1<=(JMB2$Outcome1["ctsxl", "97.5%"]), 1, 0) # binary covariate
}

Coverage=rowMeans(matrix(unlist(res_list), ncol=nmod))
# (I only look at the fixed effects from the longitudinal submodel)
names(Coverage) <- c("Y_intercept", "Y_slope", "Y_ctsx", "Y_binx") # intercept, slope, continuous covariate and binary covariate
print(Coverage) # coverage probabilities

drizopoulos commented 2 years ago

As far as I know, a 95% credible interval is not required to have 95% coverage. I.e., strictly speaking it is not a confidence interval.

—— Professor of Biostatistics Erasmus Medical Center Rotterdam The Netherlands

Από: DenisRustand @.> Στάλθηκε: Wednesday, November 17, 2021 1:57:49 PM Προς: drizopoulos/JMbayes2 @.> Κοιν.: Subscribed @.***> Θέμα: [drizopoulos/JMbayes2] Coverage probability seems too high for fixed effects parameters of the longitudinal submodels (Issue #7)

Waarschuwing: Deze e-mail is afkomstig van buiten de organisatie. Klik niet op links en open geen bijlagen, tenzij u de afzender herkent en weet dat de inhoud veilig is. Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hello, I am trying to compare the available packages for joint models with multivariate longitudinal outcome and a terminal event through simulation studies. I have surprising results with JMbayes2, the coverage probabilities for the fixed effects of the longitudinal submodels are very close to 1 instead of 0.95 (based on quantiles at 2.5% and 97.5% given in the summary of jm() output), am I doing something wrong? Please find attached a code that illustrates the issue in a simple case (one marker with random intercept and slope and a terminal event) but I have similar results with multivariate longitudinal data. Thanks, Denis Rustand

library(joineR) # for data simulation library(JMbayes2) nmod=50 # number of models nsub=100 # number of individuals

store coverage of true value (1) of each model parameter in "res_list" to compute coverage probabilities

res_list <- replicate(nmod, rep(NA, 4), simplify = FALSE) for(i in 1:nmod){

association between longi and surv is set to zero

data <- simjoint(nsub, model="intslope", gamma=0)

CoxFit <- coxph(Surv(survtime, cens) ~ ctsx + binx, data = data$survival) fm1 <- lme(Y ~ time + ctsxl + binxl, random = ~ time | id, data = data$longitudinal) JMB <- jm(CoxFit, fm1, time_var = "time") JMB2 <- summary(JMB)

1 if true value is within credible interval, 0 otherwise

res_list[[i]][1] <- ifelse(1>=(JMB2$Outcome1["(Intercept)", "2.5%"]) & 1<=(JMB2$Outcome1["(Intercept)", "97.5%"]), 1, 0) # intercept res_list[[i]][2] <- ifelse(1>=(JMB2$Outcome1["time", "2.5%"]) & 1<=(JMB2$Outcome1["time", "97.5%"]), 1, 0) # slope res_list[[i]][3] <- ifelse(1>=(JMB2$Outcome1["binxl", "2.5%"]) & 1<=(JMB2$Outcome1["binxl", "97.5%"]), 1, 0) # continuous covariate res_list[[i]][4] <- ifelse(1>=(JMB2$Outcome1["ctsxl", "2.5%"]) & 1<=(JMB2$Outcome1["ctsxl", "97.5%"]), 1, 0) # binary covariate }

Coverage=rowMeans(matrix(unlist(res_list), ncol=nmod))

(I only look at the fixed effects from the longitudinal submodel)

names(Coverage) <- c("Y_intercept", "Y_slope", "Y_ctsx", "Y_binx") # intercept, slope, continuous covariate and binary covariate print(Coverage) # coverage probabilities

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdrizopoulos%2FJMbayes2%2Fissues%2F7&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Cb4cfa1251951499dcccc08d9a9c9dc80%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637727506713703919%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=msTpdkdP8SldVRTDMxeIF%2BaZ9LwzuOtNFJpWiMZfpUY%3D&reserved=0, or unsubscribehttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADE7TT43NBLM6KISEWY23JLUMORE3ANCNFSM5IG3Z3LQ&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Cb4cfa1251951499dcccc08d9a9c9dc80%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637727506713713867%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sgg4Q3b9p8KUFbjhr%2FXZAD4oR2xmUOwQAT0Il0MkHQM%3D&reserved=0.

DenisRustand commented 2 years ago

Thank you for your quick reply, I agree with you the credible interval is not required to have 95% coverage but we can show that the frequentist coverage of the credible interval is 95% through simulation studies, which is useful to illustrate the good frequentist properties of the Bayesian models. I believe there is an issue with JMbayes2 because the frequentist coverage of the credible interval does converge towards 0.95 with other packages (e.g, JMbayes or rstanarm). In the following code, I estimate the same model with JMbayes2, JMbayes and rstanarm, and the credible interval seems too large with JMbayes2.

e.g., for the intercept of the longitudinal submodel (true value is 1): Package: Est [Cred. Int.] JMbayes2: 0.83 [0.21 - 1.44] JMbayes: 0.81 [0.58 - 1.06] rstanarm: 0.83 [0.61 - 1.04]

library(joineR) # for data simulation
library(JMbayes)
library(JMbayes2)
library(rstanarm)
set.seed(1)
nsub=100 # number of individuals

data <- simjoint(nsub, model="intslope", gamma=0)

# JMbayes2
  CoxFit <- coxph(Surv(survtime, cens) ~ ctsx + binx, data = data$survival, x=TRUE)
  fm1 <- lme(Y ~ time + ctsxl + binxl, random = ~ time | id, data = data$longitudinal)
  JB2 <- jm(CoxFit, fm1, time_var = "time")
  summary(JB2)

# JMbayes
  JB <- jointModelBayes(fm1, CoxFit, timeVar = "time")
  summary(JB)

# rstanarm
  rs <- stan_jm(formulaLong = list(Y ~ time+ctsxl+binxl + (time | id)),
    formulaEvent = survival::Surv(survtime, cens) ~ ctsx + binx,
    dataLong = data$longitudinal, dataEvent = data$survival,
    time_var = "time",
    chains = 1, iter = 1000, refresh=1000, seed = 12345)
 summary(rs)

drizopoulos commented 2 years ago

Note that the JMbayes only runs a single chain. It cannot do more. What happens if you increase the number of chains in rstanarm? Also, you could try running more iterations in JMbayes2.

—— Professor of Biostatistics Erasmus Medical Center Rotterdam The Netherlands

Από: DenisRustand @.> Στάλθηκε: Thursday, November 18, 2021 1:58:37 PM Προς: drizopoulos/JMbayes2 @.> Κοιν.: Dimitris Rizopoulos @.>; State change @.> Θέμα: Re: [drizopoulos/JMbayes2] Coverage probability seems too high for fixed effects parameters of the longitudinal submodels (Issue #7)

Waarschuwing: Deze e-mail is afkomstig van buiten de organisatie. Klik niet op links en open geen bijlagen, tenzij u de afzender herkent en weet dat de inhoud veilig is. Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Thank you for your quick reply, I agree with you the credible interval is not required to have 95% coverage but we can show that the frequentist coverage of the credible interval is 95% through simulation studies, which is useful to illustrate the good frequentist properties of the Bayesian models. I believe there is an issue with JMbayes2 because the frequentist coverage of the credible interval does converge towards 0.95 with other packages (e.g, JMbayes or rstanarm). In the following code, I estimate the same model with JMbayes2, JMbayes and rstanarm, and the credible interval seems too large with JMbayes2.

e.g., for the intercept of the longitudinal submodel (true value is 1): Package: Est [Cred. Int.] JMbayes2: 0.83 [0.21 - 1.44] JMbayes: 0.81 [0.58 - 1.06] rstanarm: 0.83 [0.62 - 1.17]

library(joineR) # for data simulation library(JMbayes) library(JMbayes2) library(rstanarm) set.seed(1) nsub=100 # number of individuals

data <- simjoint(nsub, model="intslope", gamma=0)

JMbayes2

CoxFit <- coxph(Surv(survtime, cens) ~ ctsx + binx, data = data$survival, x=TRUE) fm1 <- lme(Y ~ time + ctsxl + binxl, random = ~ time | id, data = data$longitudinal) JB2 <- jm(CoxFit, fm1, time_var = "time") summary(JB2)

JMbayes

JB <- jointModelBayes(fm1, CoxFit, timeVar = "time") summary(JB)

rstanarm

rs <- stan_jm(formulaLong = list(Y ~ time+ctsxl+binxl + (time | id)), formulaEvent = survival::Surv(survtime, cens) ~ 1, dataLong = data$longitudinal, dataEvent = data$survival, time_var = "time", chains = 1, iter = 100, refresh=1000, seed = 12345) summary(rs)

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdrizopoulos%2FJMbayes2%2Fissues%2F7%23issuecomment-972842370&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Cd1cb6f5368bb46df8f1c08d9aa9323e0%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637728371210606235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2FF%2FE3DuR4%2BEbGl%2BlTyiwr%2FZvKd3iw%2BxzsG%2Bq%2BzqkZDk%3D&reserved=0, or unsubscribehttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADE7TT77GQN6ISBDAPHSBSDUMTZ73ANCNFSM5IG3Z3LQ&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Cd1cb6f5368bb46df8f1c08d9aa9323e0%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637728371210606235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=yTu32CJrNbCFiuAItGJF1ju46vncAMeebGhdbLiAduA%3D&reserved=0.

DenisRustand commented 2 years ago

With the same example as in my previous message but specifying "chains = 4, iter = 4000" with rstanarm and "n_chains = 4L, n_iter = 24000L, n_burnin = 1000L" with JMbayes2, I get similar differences: JMbayes2: 0.82 [0.24 - 1.42] rstanarm: 0.81 [0.56 - 1.05]

drizopoulos commented 2 years ago

Perhaps the priors may play a role. Try setting jm(..., priors = list(Tau_betas_HC = diag(0.1, n))) with n denoting the number of fixed effects.

DenisRustand commented 2 years ago

Still the same, default Gaussian priors for the fixed effects in rstanarm have scale=2.5, when I set the corresponding precision (0.16) with JMbayes2, I get: JMbayes2: 0.82 [0.22 - 1.43]

drizopoulos commented 2 years ago

Could you try with lower precision?

DenisRustand commented 2 years ago

Precision=0.01 gives: 0.82 [0.22 - 1.44] Precision=0.1 gives: 0.82 [0.22 - 1.43] Precision=1 gives: 0.82 [0.25 - 1.40] Precision=20 gives: 0.82 [0.51 - 1.12] Precision=50 gives: 0.82 [0.61 - 1.03] Precision=500 gives: 0.82 [0.73 - 0.89]

drizopoulos commented 2 years ago

This issue has been resolved in JMbayes2_0.3-0.

drizopoulos / JMbayes2

Coverage probability seems too high for fixed effects parameters of the longitudinal submodels #7

store coverage of true value (1) of each model parameter in "res_list" to compute coverage probabilities

association between longi and surv is set to zero

1 if true value is within credible interval, 0 otherwise

(I only look at the fixed effects from the longitudinal submodel)

JMbayes2

JMbayes

rstanarm