`msfit.flexsurvreg` not working when stratifying variable is not called `"trans"`, _e.g._ when two trans. are assumed proportional

mafed commented 1 year ago

I tried to reproduce the model "c2" from the mstate vignette with flexsurvreg and the Weibull distribution. I found that form.msm.newdata gives an error, probably related to this line

Here below a Minimal Reproducible Example, I think it has to do with the variable names: it creates problems if you don't use the name "trans" neither in the formula nor as stratifying variable.

library(mstate)
library(flexsurv)

## take data from the 'mstate' tutorial
data("ebmt3")
tmat <- trans.illdeath(names = c("Tx", "PR", "RelDeath"))
covs <- c("dissub", "age", "drmatch", "tcd", "prtime")
msbmt <- msprep(time = c(NA, "prtime", "rfstime"), 
                status = c(NA, "prstat", "rfsstat"), 
                data = ebmt3, 
                trans = tmat, 
                keep = covs)
msbmt <- expand.covs(msbmt, covs, append = TRUE, longnames = FALSE)

## add time-varying covariate for PH transition, ref page 8 of tutorial
msbmt$pr <- 0
msbmt$pr[msbmt$trans == 3] <- 1

## fit weibull model with 'same' formula
flex_c2 <- flexsurvreg(
  Surv(Tstart, Tstop, status) ~ 
    dissub1.1 + dissub2.1 +
       age1.1 +    age2.1 + drmatch.1 + tcd.1 + dissub1.2 + dissub2.2 +
       age1.2 +    age2.2 + drmatch.2 + tcd.2 + dissub1.3 + dissub2.3 +
       age1.3 +    age2.3 + drmatch.3 + tcd.3 + 
    pr + to + shape(to), 
  data = msbmt, dist = "weibull")
flex_c2

## preparing newdata
newd <- data.frame(dissub  = rep(0, 3), 
                   age     = rep(0, 3), 
                   drmatch = rep(0, 3), 
                   tcd     = rep(0, 3), 
                   trans   = 1:3)
newd$dissub <- factor(newd$dissub, 
                      levels = 0:2, 
                      labels = levels(ebmt3$dissub))
newd$age <- factor(newd$age, 
                   levels = 0:2, 
                   labels = levels(ebmt3$age))
newd$drmatch <- factor(newd$drmatch, 
                       levels = 0:1, 
                       labels = levels(ebmt3$drmatch))
newd$tcd <- factor(newd$tcd, 
                   levels = 0:1, 
                   labels = levels(ebmt3$tcd))
attr(newd, "trans") <- tmat
class(newd) <- c("msdata", "data.frame")
newd <- expand.covs(newd, covs[1:4], longnames = FALSE)

newd$pr     <- c(0, 0, 1)    # time-varying covariate
newd$strata <- c(1, 2, 2)    # strata (from tutorial)
newd$to     <- c(1, 2, 2)    # hope it will find it in the formula

## msfit.flexsurvreg will give this error:
## Error: variable 'to' was fitted with type "numeric" but type 
## "factor" was supplied
msfit.flexsurvreg(flex_c2, 
                  t = 90,
                  newdata = newd,
                  tvar = "to",
                  trans = tmat)
## which is not true:
class(msbmt$to)
class(newd$to)

## problem is likely here
flexsurv:::form.msm.newdata(flex_c2, 
                            newdata = newd,
                            tvar = "to",
                            trans = tmat)
flex_c2$covdata$xlev    # maybe because this is empty?

chjackson commented 1 year ago

If to is a indicator for a particular transition, it should be defined as a factor before fitting the flexsurvreg model. In your dataset, msbmt it is stored as numeric, so you cannot interpret the fitted model as a multi-state model with the transition number as a predictor.

However I could handle this better by checking whether tvar is a factor before trying to read its levels in form.msm.newdata, then giving an informative error message if it isn't.

I think that I should also have written help(msfit.flexsurvreg), section tvar more clearly, to say e.g. "This should have been defined as a factor, with factor levels that correspond to elements of trans".

mafed commented 1 year ago

You're right, adding msbmt$to <- as.factor(msbmt$to) before model fitting did the trick. Now msfit.flexsurvreg works but I noticed I should change the variable newd$to to reflect that I assume transitions 2 and 3 to be proportional. In the tutorial they achieve that using newd$strata <- c(1, 2, 2) but I wouldn't know how to get the similar results here.

chjackson commented 1 year ago

I've never tried to reproduce this tutorial, but that sounds like you want a model where transitions 2, 3 have a common shape parameter, that is different from the shape parameter for transition 1, but all transitions have a different scale parameter.

To implement that, you could create a factor for "transition 2 or 3", say, msbmt$to23 <- factor(msbmt$to %in% 2:3) and place that as a covariate on the shape. Then in your flexsurvreg model formula, use + to + shape(to23) instead of + to + shape(to).

Also dist = "weibull" is the accelerated failure time parameterisation of the Weibull. For the proportional hazards version you need "dist="weibullPH".

mafed commented 1 year ago

Thanks for the help, indeed you got what I want to obtain.

The problem is that the "to" variable already takes only values 2 and 3 because is the destination of the transition, so the variable "to23" would be always equal to TRUE.

I thought about it a bit more and probably I'll obtain what I need without other modifications, apart from as.factor(to).

BTW, I want the transitions 2 and 3 to be proportional and that's the interpretation of the "pr" variable that distinguish between between 2 and 3 despite they have the same shape parameter. Or are you saying that if I don't use the PH parametrisation of the Weibull I can't interpret the coefficient of "pr" in that sense?

chjackson commented 1 year ago

If $h_r(t)$ is the hazard function for transition r, $h_2(t)$ will only be proportional to $h_3(t)$ if you use a proportional hazards model. In an accelerated failure time model, identical shape parameters implies that the time acceleration coefficient will be constant with time (see the distributions vignette for the definition of this)

mafed commented 1 year ago

I'm OK if it has a different interpretation, thanks for pointing it out. Though now that I have msfit.flexsurvreg working I still don't understand why it doesn't give me the hazard for transition "1" of the "new patient" (see below code, modified).

In my actual project, that mimics this specific situation, I have 6 transition and the variable corresponding to "to" (basically stratifying variable) is equal to c(1, 2, 3, 2, 4, 4) and I use as "pr" variable c(FALSE, FALSE, FALSE, TRUE, TRUE, FALSE). Meaning transition 2 and 4 have the same shape parameter as well as transition 5 and 6, the "pr" variable serves to distinguish between these transitions, like in the tutorial. The problem is that when I generate "newdata" with all covariates equal to zero, transition 5 and 6 are estimated to have zero hazard. Is this still because "tvar" needs to have the same factor labels of "trans"?

If instead I use + trans + shape(to) in the flexsurvreg model formula, I get the hazard also for transition 1, basically forfeiting the use of the "pr" time-varying covariate. This way I also should get the shape of transition 2 and 3 to be equal but I can still distinguish these transition by the trans parameters.

library(mstate)
library(flexsurv)

## take data from the 'mstate' tutorial
data("ebmt3")
tmat <- trans.illdeath(names = c("Tx", "PR", "RelDeath"))
covs <- c("dissub", "age", "drmatch", "tcd", "prtime")
msbmt <- msprep(time = c(NA, "prtime", "rfstime"), 
                status = c(NA, "prstat", "rfsstat"), 
                data = ebmt3, 
                trans = tmat, 
                keep = covs)
msbmt <- expand.covs(msbmt, covs, append = TRUE, longnames = FALSE)

## add time-varying covariate for PH transition, ref page 8 of tutorial
msbmt$pr <- msbmt$trans == 3
msbmt$to <- as.factor(msbmt$to)
table(msbmt$to)

## fit weibull model with 'same' formula
flex_c2 <- flexsurvreg(
  Surv(Tstart, Tstop, status) ~ 
    dissub1.1 + dissub2.1 +
       age1.1 +    age2.1 + drmatch.1 + tcd.1 + dissub1.2 + dissub2.2 +
       age1.2 +    age2.2 + drmatch.2 + tcd.2 + dissub1.3 + dissub2.3 +
       age1.3 +    age2.3 + drmatch.3 + tcd.3 + 
    pr + to + shape(to), 
  data = msbmt, dist = "weibull")
flex_c2

## preparing newdata
newd <- data.frame(dissub = rep(0, 3), 
                   age = rep(0, 3), 
                   drmatch = rep(0, 3), 
                   tcd = rep(0, 3), 
                   trans = as.factor(1:3))
newd$dissub <- factor(newd$dissub, 
                      levels = 0:2, 
                      labels = levels(ebmt3$dissub))
newd$age <- factor(newd$age, 
                   levels = 0:2, 
                   labels = levels(ebmt3$age))
newd$drmatch <- factor(newd$drmatch, 
                       levels = 0:1, 
                       labels = levels(ebmt3$drmatch))
newd$tcd <- factor(newd$tcd, 
                   levels = 0:1, 
                   labels = levels(ebmt3$tcd))
attr(newd, "trans") <- tmat
class(newd) <- c("msdata", "data.frame")
newd <- expand.covs(newd, covs[1:4], longnames = FALSE)

newd$to     <- as.factor(c(2, 3, 3))    # same shape for trans 2 and 3
newd$pr     <- c(FALSE, FALSE, TRUE)    # time-varying covariate

## msfit.flexsurvreg gives $Haz == 0 for transition 1
msfit.flexsurvreg(flex_c2, 
                  t = 5,
                  newdata = newd,
                  tvar = "to",
                  trans = tmat)

chjackson commented 1 year ago

I don't see the purpose of pr in the flexsurv model. As you say, + trans + shape(to) achieves the model you said you wanted.

Perhaps the confusion here is because flexsurv is fully parametric. mstate does things slightly differently, because the baseline hazard is nonparametric.

chjackson commented 1 year ago

Required fixes done in https://github.com/chjackson/flexsurv-dev/commit/97cfcc9d2518024858f00ff98d86b279c6b40bec

chjackson / flexsurv

`msfit.flexsurvreg` not working when stratifying variable is not called `"trans"`, _e.g._ when two trans. are assumed proportional #161