drizopoulos / JMbayes

Joint Models for Longitudinal and Survival Data using MCMC
59 stars 24 forks source link

How feasible to add CompRisk option to #39

Closed swihart closed 5 years ago

swihart commented 5 years ago

Thanks, as always, for a great body of work and documentation.

swihart commented 5 years ago

This article seems to imply that no software as of June 2018 could do it:

"Competing risks [28, 29] and recurrent events [38] have been incorporated into joint modelling R packages already, but are limited to the case of a solitary longitudinal outcome. "

drizopoulos commented 5 years ago

My PhD student Greg Papageorgiou has implemented a multi state models extension in JMbayes. This should also fit competing risks settings.

@Greg, could you provide an example.

From: Bruce Swihart notifications@github.com<mailto:notifications@github.com> Date: Tuesday, 23 Apr 2019, 17:30 To: drizopoulos/JMbayes JMbayes@noreply.github.com<mailto:JMbayes@noreply.github.com> Cc: Subscribed subscribed@noreply.github.com<mailto:subscribed@noreply.github.com> Subject: Re: [drizopoulos/JMbayes] How feasible to add CompRisk option to (#39)

This article seems to imply that no software as of June 2018 could do it:

"Competing risks [28, 29] and recurrent events [38] have been incorporated into joint modelling R packages already, but are limited to the case of a solitary longitudinal outcome. "https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fpmc%2Farticles%2FPMC6047371%2F%23Sec14title&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C737506594b974a6d2eb008d6c7f84c6e%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C636916266579255566&sdata=2M20K%2Fv%2BJ7asxtLeVf9yhPozQzZ7l4XWyL8UwGTMS24%3D&reserved=0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdrizopoulos%2FJMbayes%2Fissues%2F39%23issuecomment-485828092&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C737506594b974a6d2eb008d6c7f84c6e%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C636916266579265575&sdata=spMaLDuDwarqfahvMm7GpvgTcq7bxxXZask1jLMZjOY%3D&reserved=0, or mute the threadhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADE7TT47FPRWAE7DFANXWBLPR4MR7ANCNFSM4HHYOMGQ&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C737506594b974a6d2eb008d6c7f84c6e%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C636916266579275583&sdata=%2BX4u%2FyhEJQqw0kKA23V7lJRfmMZf%2BoSnNxl6kazTl7A%3D&reserved=0.

swihart commented 5 years ago

Here's my attempt at reprex. Any help would be appreciated. (UPDATE: scroll down to @gpapageorgiou response and my updated reprex below it)

library(JM)
library(JMbayes)
## slides: http://www.drizopoulos.com/courses/EMC/ESP72.pdf
## slide 163
pbc2.idCR <- crLong(pbc2.id, statusVar = "status",
                    censLevel = "alive", nameStrata = "CR")
pbc2.idCR[pbc2.idCR$id %in% c(1,2,5),
          c("id", "years", "status", "CR", "status2")]
## slide 164
coxFit.CR <- coxph(Surv(years, status2) ~ drug * strata(CR),
                   data = pbc2.idCR, x = TRUE)

## slide 164 & 165
lmeFit.CR <- lme(log(serBilir) ~ drug * year, data = pbc2,
                 random = ~ year | id)
jointFit.CR <- jointModel(lmeFit.CR, coxFit.CR, timeVar = "year",
                          method = "spline-PH-aGH", CompRisk = TRUE,
                          interFact = list(value = ~ CR, data = pbc2.idCR))
summary(jointFit.CR)

## follow section 5 of 
## https://cran.r-project.org/web/packages/mstate/vignettes/Tutorial.pdf
pbc2.id[pbc2.id$id %in% c(1,2,5),
        c("id", "years", "status", "status2")]

library(mstate)
?trans.comprisk
tmat <- trans.comprisk(2, names = c("alive", "transplanted", "dead"))
tmat

pbc2.id$statTX <- as.numeric(pbc2.id$status=="transplanted")
pbc2.id$statDD <- as.numeric(pbc2.id$status=="dead")
?msprep
pbc2.idMS <- msprep(time =   c(NA, "years", "years"), 
                    status = c(NA, "statTX", "statDD"), 
                    data = pbc2.id, 
                    keep = "drug", 
                    trans = tmat)

pbc2.idMS[pbc2.idMS$id %in% c(1,2,5),]

coxFit.MS <- coxph(Surv(time, status) ~ drug * strata(trans),
                   data = pbc2.idMS, model=TRUE)

## reproduce jointFit.CR with mvglmer/mvJointModelBayes (?)
mvglmerFit.MS <- mvglmer(list(log(serBilir) ~ drug * year + (year|id)), 
                         data = pbc2,
                         families=list(gaussian)
)

jointFit.MS <-
  mvJointModelBayes(mvglmerFit.MS,
                    coxFit.MS,
                    timeVar="year",
                    multiState=TRUE,
                    data_MultiState=pbc2.idMS,
                    idVar_MultiState = "id"
                    )

Where the last call (mvJointModelBayes()) gives the error about different data sizes:

Error in `[[<-.data.frame`(`*tmp*`, timeVar, value = c(`1` = 1.09517029898149,  : 
  replacement has 624 rows, data has 312

But I think I'm on the right track because the following models are similar:

## similar:
coxFit.CR
coxFit.MS

## similar:
summary(lmeFit.CR)
summary(mvglmerFit.MS)

Output:

> ## similar:
> coxFit.CR
Call:
coxph(formula = Surv(years, status2) ~ drug * strata(CR), data = pbc2.idCR, 
    x = TRUE)

                                coef exp(coef) se(coef)      z     p
drugD-penicil                -0.3857    0.6800   0.3771 -1.023 0.306
drugD-penicil:strata(CR)dead  0.3840    1.4682   0.4133  0.929 0.353

Likelihood ratio test=1.06  on 2 df, p=0.5877
n= 624, number of events= 169 
> coxFit.MS
Call:
coxph(formula = Surv(time, status) ~ drug * strata(trans), data = pbc2.idMS, 
    model = TRUE, x = TRUE)

                                      coef exp(coef) se(coef)      z     p
drugD-penicil                      -0.3857    0.6800   0.3771 -1.023 0.306
drugD-penicil:strata(trans)trans=2  0.3840    1.4682   0.4133  0.929 0.353

Likelihood ratio test=1.06  on 2 df, p=0.5877
n= 624, number of events= 169 

and

>## similar:
> summary(lmeFit.CR)
Linear mixed-effects model fit by REML
 Data: pbc2 
       AIC      BIC    logLik
  3085.474 3130.042 -1534.737

Random effects:
 Formula: ~year | id
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 0.9985870 (Intr)
year        0.1722100 0.417 
Residual    0.3489329       

Fixed effects: log(serBilir) ~ drug * year 
                        Value  Std.Error   DF   t-value p-value
(Intercept)         0.5630313 0.08255260 1631  6.820274  0.0000
drugD-penicil      -0.1332474 0.11610813  310 -1.147614  0.2520
year                0.1797540 0.01782101 1631 10.086631  0.0000
drugD-penicil:year -0.0044099 0.02490439 1631 -0.177075  0.8595
 Correlation: 
                   (Intr) drgD-p year  
drugD-penicil      -0.711              
year                0.248 -0.176       
drugD-penicil:year -0.177  0.251 -0.716

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-4.3241109 -0.4987081 -0.0166671  0.4528833  5.2878568 

Number of Observations: 1945
Number of Groups: 312 
> summary(mvglmerFit.MS)

Call:
mvglmer(formulas = list(log(serBilir) ~ drug * year + (year | 
    id)), data = pbc2, families = list(gaussian))

Data Descriptives:
Number of Groups: 312
Number of Observations:

log(serBilir) 1945

      DIC       pD
 2363.412 939.6878

Random-effects covariance matrix:
              StdDev    Corr
(Intercept)1  1.0061  (Int)1
year1         0.1748  0.4013

Outcome: log(serBilir) 
                   PostMean  StDev  StErr    2.5%  97.5%     P   Rhat
(Intercept)          0.5660 0.0838 0.0026  0.4076 0.7357 0.000 0.9991
drugD-penicil       -0.1320 0.1157 0.0037 -0.3627 0.0880 0.262 1.0071
year                 0.1796 0.0190 0.0006  0.1431 0.2187 0.000 0.9994
drugD-penicil:year  -0.0039 0.0258 0.0008 -0.0526 0.0492 0.882 1.0003
sigma                0.3494 0.0068 0.0002  0.3366 0.3633 0.000 1.0006

MCMC summary:
engine: JAGS 
iterations: 28000 
adapt: 3000 
burn-in: 3000 
thinning: 50 
time: 1.3 min
swihart commented 5 years ago

Notes as I try to debug:

Before Line 260 of mvJointModelBayes, length(unique(dataL[[idVar]])) is 312, length(dataS[[idVar]]) is 0, and idT is not defined yet.

After the line 261 idT <- dataS[[idVar]] <- unique(dataL[[idVar]]), length(unique(dataL[[idVar]])) is 312, length(dataS[[idVar]]) is 624, and length(idT) is 312.

And the dataS looks like the id variable is incorrectly assigned, should be 1,1,2,2, ... 312,312.

> head(dataS)
  Surv(time, status)      drug strata(trans) id
1          1.095170+ D-penicil       trans=1  1
2           1.095170 D-penicil       trans=2  2
3         14.152338+ D-penicil       trans=1  3
4         14.152338+ D-penicil       trans=2  4
5          2.770781+ D-penicil       trans=1  5
6           2.770781 D-penicil       trans=2  6
> tail(dataS)
    Surv(time, status)      drug strata(trans)  id
619          4.402585+ D-penicil       trans=1 307
620          4.402585+ D-penicil       trans=2 308
621          4.128792+ D-penicil       trans=1 309
622          4.128792+ D-penicil       trans=2 310
623          3.989158+   placebo       trans=1 311
624          3.989158+   placebo       trans=2 312
gpapageorgiou commented 5 years ago

Dear @swihart,

Below the code to reproduce the Competing Risks example of package JM using JMbayes::mvJointModelBayes()

library(JMbayes)
library(mstate)
library(splines)
library(survival)

## Prepare Dataset 
pbc2.id$death <- ifelse(pbc2.id$status %in% "dead", 1, 0)
pbc2.id$transplant <- ifelse(pbc2.id$status %in% "transplanted", 1, 0)

## Create transition matrix
tmat <- matrix(NA, 3, 3)
dimnames(tmat) <- list(from = c("alive", "transplanted", "dead"), 
                       to = c("alive", "transplanted", "dead"))
tmat[1, 2:3] <- 1:2

## Create Multi-state version of the dataset
covs <- c("drug", "sex")
pbc2.mstate <- msprep(time = c(NA, "years", "years"), 
                      status = c(NA, "transplant", "death"), 
                      data = pbc2.id, 
                      trans = tmat, 
                      keep = covs, 
                      id = "id")

## Expand covariates per transition
pbc2.mstate <- expand.covs(pbc2.mstate, covs, append = TRUE, longnames = FALSE)

## Fit Competing risks sub-model. Here, drug.1, drug.2, sex.1, sex.2 are 
## the expanded covariates from
## the previous step, with drug.1 for example representing 
## the effect of drug for transition 1, drug.2 
## the effect of drug for transition 2 etc.

coxCRfit <- coxph(Surv(Tstart, Tstop, status) ~ drug.1 + drug.2 + sex.1 + sex.2 + 
                        strata(trans) + cluster(id), 
                      data = pbc2.mstate, x = TRUE, model = TRUE)

## fit mixed-effects sub-model
mixfit <- mvglmer(list(log(serBilir) ~ drug * ns(year, 3) + (ns(year, 3) | id)), 
                  data = pbc2, 
                  families = list(gaussian))

## Specify interactions in order to allow different effect of the outcome for each risk
interacts <- list("log(serBilir)" = ~ strata(trans) - 1)

## fit multistate model
JMfit <- mvJointModelBayes(mixfit, coxCRfit, timeVar = "year",
                           Interactions = interacts, 
                           multiState = TRUE, 
                           data_MultiState = pbc2.mstate, 
                           idVar_MultiState = "id", 
                           control = list(equal.strata.knots = TRUE, 
                                          equal.strata.bound.knots = TRUE))

summary(JMfit)
swihart commented 5 years ago

Thanks @gpapageorgiou and @drizopoulos ! I'm leaving an updated version of what I attempted above FWIW -- comments welcomed but not necessary, of course! Thanks for your help and promptitude!

library(JM)
library(JMbayes)

##############################
## Do things the JM, CR way ##
## Limited to 1 longitudinal##
## process                  ##
##############################

## slides: http://www.drizopoulos.com/courses/EMC/ESP72.pdf
## slide 163
pbc2.idCR <- crLong(pbc2.id, statusVar = "status",
                    censLevel = "alive", nameStrata = "CR")
pbc2.idCR[pbc2.idCR$id %in% c(1,2,5),
          c("id", "years", "status", "CR", "status2")]
## slide 164
coxFit.CR <- coxph(Surv(years, status2) ~ drug * strata(CR),
                   data = pbc2.idCR, x = TRUE)
## equivalently:
pbc2.idCR$drug.1 <- (pbc2.idCR$drug=="D-penicil" & pbc2.idCR$CR=="transplanted")+0L
pbc2.idCR$drug.2 <- (pbc2.idCR$drug=="D-penicil" & pbc2.idCR$CR=="dead"        )+0L
coxFit.CR2<- coxph(Surv(years, status2) ~ drug.1 + drug.2 + strata(CR),
                   data = pbc2.idCR, x = TRUE)

## slide 164 & 165
lmeFit.CR <- lme(log(serBilir) ~ drug * year, data = pbc2,
                 random = ~ year | id)

jointFit.CR <- jointModel(lmeFit.CR, coxFit.CR, timeVar = "year",
                          method = "spline-PH-aGH", CompRisk = TRUE,
                          interFact = list(value = ~ CR, data = pbc2.idCR))

## same as jointFit.CR -- different parameterization (use coxFit.CR2 and change interFact).
jointFit.CR2<- jointModel(lmeFit.CR, coxFit.CR2, timeVar = "year",
                          method = "spline-PH-aGH", CompRisk = TRUE,
                          interFact = list(value = ~ strata(CR) - 1, data = pbc2.idCR))

###################################
## Do things the JMbayes, MS way ##
## Allows competing risks and >1 ##
## longitudinal process          ##
###################################
library(mstate)

## follow section 5 of
## https://cran.r-project.org/web/packages/mstate/vignettes/Tutorial.pdf
pbc2.id[pbc2.id$id %in% c(1,2,5),
        c("id", "years", "status", "status2")]

## prepare tmat
tmat <- trans.comprisk(2, names = c("alive", "transplanted", "dead"))
tmat
## use msprep()
pbc2.id$statTX <- as.numeric(pbc2.id$status=="transplanted")
pbc2.id$statDD <- as.numeric(pbc2.id$status=="dead")
?msprep
pbc2.idMS <- msprep(time =   c(NA, "years", "years"),
                    status = c(NA, "statTX", "statDD"),
                    data = pbc2.id,
                    keep = "drug",
                    trans = tmat)

## Crucial step!  I omitted this in my previous comment.
## Expand covariates per transition. See before/after expansion.
pbc2.idMS[pbc2.idMS$id %in% c(1,2,5),]
pbc2.idMS <- expand.covs(pbc2.idMS, "drug", append = TRUE, longnames = FALSE)
pbc2.idMS[pbc2.idMS$id %in% c(1,2,5),]

## explicitly put in the expanded covariates and add strata() and add cluster()
coxFit.MS <- coxph(Surv(Tstart, Tstop, status) ~ drug.1 + drug.2 + strata(trans) + cluster(id),
                   data = pbc2.idMS, model=TRUE, x=TRUE)

## reproduce jointFit.CR with mvglmer/mvJointModelBayes (?)
mvglmerFit.MS <- mvglmer(list(log(serBilir) ~ drug * year + (year|id)),
                         data = pbc2,
                         families=list(gaussian)
)

## Crucial step:
## Specify interactions in order to allow different effect of the outcome for each risk
interacts <- list("log(serBilir)" = ~ strata(trans) - 1)

jointFit.MS <-
  mvJointModelBayes(mvglmerFit.MS,
                    coxFit.MS,
                    timeVar="year",
                    Interactions=interacts,
                    multiState=TRUE,
                    data_MultiState=pbc2.idMS,
                    idVar_MultiState = "id",
                    control = list(equal.strata.knots=TRUE,
                                   equal.strata.bound.knots=TRUE)
  )

## similar:
summary(jointFit.CR2)
summary(jointFit.MS)

## similar:
coxFit.CR
coxFit.MS

## similar:
summary(lmeFit.CR)
summary(mvglmerFit.MS)
harryparr commented 4 years ago

Thanks for this tutorial @gpapageorgiou & @swihart. I was wondering if it was possible to perform dynamic predictions from a competing risk mvJMbayes object? I've attempted to do so with the following code:

ND <- pbc2[pbc2$id == 2, ] # the data of Subject 2
ND_merge <- merge(ND, coxCRfit$model, by.x = "id", by.y="(cluster)")
names(ND_merge)[names(ND_merge) == 'strata(trans)'] <- 'trans'
str(ND_merge)
'data.frame':   8 obs. of  26 variables:
 $ id                         : num  2 2 2 2 2 2 2 2
 $ years                      : num  14.2 14.2 14.2 14.2 14.2 ...
 $ status                     : Factor w/ 3 levels "alive","transplanted",..: 1 1 1 1 1 1 1 1
 $ drug                       : Factor w/ 2 levels "placebo","D-penicil": 2 2 2 2 2 2 2 2
 $ age                        : num  56.4 56.4 56.4 56.4 56.4 ...
 $ sex                        : Factor w/ 2 levels "male","female": 2 2 2 2 2 2 2 2
 $ year                       : num  0 0 4.9 4.9 7.89 ...
 $ ascites                    : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 2 2 2
 $ hepatomegaly               : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2
 $ spiders                    : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2
 $ edema                      : Factor w/ 3 levels "No edema","edema no diuretics",..: 1 1 2 2 3 3 3 3
 $ serBilir                   : num  1.1 1.1 2.6 2.6 3.6 3.6 4.6 4.6
 $ serChol                    : int  302 302 230 230 244 244 237 237
 $ albumin                    : num  4.14 4.14 3.32 3.32 2.8 2.8 2.67 2.67
 $ alkaline                   : int  7395 7395 1110 1110 779 779 669 669
 $ SGOT                       : num  114 114 132 132 119 ...
 $ platelets                  : int  221 221 135 135 113 113 100 100
 $ prothrombin                : num  10.6 10.6 11.3 11.3 11.5 11.5 11.5 11.5
 $ histologic                 : int  3 3 3 3 3 3 3 3
 $ status2                    : num  0 0 0 0 0 0 0 0
 $ Surv(Tstart, Tstop, status): 'Surv' num [1:8, 1:3] (0,14.2+] (0,14.2+] (0,14.2+] (0,14.2+] ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "start" "stop" "status"
  ..- attr(*, "type")= chr "counting"
 $ drug.1                     : num  1 0 1 0 1 0 1 0
 $ drug.2                     : num  0 1 0 1 0 1 0 1
 $ sex.1                      : num  1 0 1 0 1 0 1 0
 $ sex.2                      : num  0 1 0 1 0 1 0 1
 $ trans                      : Factor w/ 2 levels "trans=1","trans=2": 1 2 1 2 1 2 1 2

survfitJM(CR_JMfit, ND_merge)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

The traceback shows below:

traceback()
8: stop("contrasts can be applied only to factors with 2 or more levels")
7: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
6: model.matrix.default(term, data = newdata.GK.postRE.i)
5: model.matrix(term, data = newdata.GK.postRE.i)
4: FUN(X[[i]], ...)
3: lapply(TermsU, function(term) {
       model.matrix(term, data = newdata.GK.postRE.i)
   })
2: survfitJM.mvJMbayes(CR_JMFit, ND_merge)
1: survfitJM(CR_JMFit, ND_merge)

The error message is not too obvious to me and I don't see an apparent reason why it shouldn't work after combining the longitudinal and msprep data from the cox$model matrix. I don't see why the one-factor should be an issue when subsetting a singular patient. Any insight would be greatly appreciated.

Harry

gpapageorgiou commented 4 years ago

@harryparr Thank you for your question.

Currently survfitJM() does not support mvJMbayes objects that were fitted using the argument multistate = TRUE. The actual error message is probably misleading in this case. Both the R and C++ parts of the source code would need to be adjusted for this to work.

Best, Greg