Suggestion for input check

alexvolkmann commented 2 years ago

Dear @drizopoulos ,

The code implicitly assumes that the data set for the longitudinal model is already ordered by idVar and time_var. This is relevant when the data set contains missing values as in the following example:

# Cox model for the composite event death or transplantation
pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
CoxFit <- coxph(Surv(years, status2) ~ sex, data = pbc2.id)

# a linear mixed model for log serum cholesterol with random ordering
set.seed(123)
pbc2_ro <- pbc2[sample(seq_len(nrow(pbc2)), nrow(pbc2)), ]
fm1 <- lme(log(serChol) ~ year * sex, data = pbc2_ro, random = ~ year | id,
           na.action = na.omit)

# the joint model throws an error
jointFit <- jm(CoxFit, list(fm1), time_var = "year",
               n_iter = 12000L, n_burnin = 2000L, n_thin = 5L)

Error in mat[id, ] <- b : number of items to replace is not a multiple of replacement length

The information which observations to omit comes from the longitudinal model (the attribute provides unordered info) and is applied to the ordered data in function jm. This messes up the idVar levels and leads to an incorrect reduction in unq_idL: https://github.com/drizopoulos/JMbayes2/blob/7ae452ec1d32ac213c662bcdb4e823ecc9c6b88b/R/jm.R#L68 https://github.com/drizopoulos/JMbayes2/blob/7ae452ec1d32ac213c662bcdb4e823ecc9c6b88b/R/jm.R#L85 https://github.com/drizopoulos/JMbayes2/blob/7ae452ec1d32ac213c662bcdb4e823ecc9c6b88b/R/jm.R#L113-L124

Maybe you could include a more explicit check of this condition and a telling error message? Best, Alex

drizopoulos commented 2 years ago

When I use the development version from GitHub it works for me, i.e.,

> library("JMbayes2")
Loading required package: survival
Loading required package: nlme
Loading required package: GLMMadaptive
Loading required package: splines
> 
> 
> # Cox model for the composite event death or transplantation
> pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
> CoxFit <- coxph(Surv(years, status2) ~ sex, data = pbc2.id)
> 
> # a linear mixed model for log serum cholesterol with random ordering
> set.seed(123)
> pbc2_ro <- pbc2[sample(seq_len(nrow(pbc2)), nrow(pbc2)), ]
> fm1 <- lme(log(serChol) ~ year * sex, data = pbc2_ro, random = ~ year | id,
+            na.action = na.omit)
> 
> jointFit <- jm(CoxFit, list(fm1), time_var = "year",
+                n_iter = 12000L, n_burnin = 2000L, n_thin = 5L)
> 
> summary(jointFit)

Call:
jm(Surv_object = CoxFit, Mixed_objects = list(fm1), time_var = "year", 
    n_iter = 12000L, n_burnin = 2000L, n_thin = 5L)

Data Descriptives:
Number of Groups: 312           Number of events: 169 (54.2%)
Number of Observations:
  log(serChol): 1124

                  DIC      WAIC      LPML
marginal    1659.4244 1743.4869 -972.6610
conditional  544.6122  499.8164 -802.6694

Random-effects covariance matrix:

       StdDev   Corr 
(Intr) 0.3565 (Intr) 
year   0.0351 -0.4408

Survival Outcome:
                       Mean  StDev    2.5%   97.5%      P   Rhat
sexfemale           -0.5986 0.2745 -1.0941 -0.0505 0.0327 1.0013
value(log(serChol))  0.6240 0.3317 -0.0711  1.2494 0.0730 1.0003

Longitudinal Outcome: log(serChol) (family = gaussian, link = identity)
                  Mean  StDev    2.5%  97.5%      P   Rhat
(Intercept)     5.7456 0.1072  5.5389 5.9579 0.0000 1.0010
year           -0.0386 0.0243 -0.0877 0.0082 0.1053 1.0003
sexfemale       0.0357 0.1137 -0.1865 0.2568 0.7513 1.0008
year:sexfemale  0.0189 0.0256 -0.0313 0.0699 0.4637 1.0002
sigma           0.2128 0.0069  0.2000 0.2270 0.0000 1.0011

MCMC summary:
chains: 3 
iterations per chain: 12000 
burn-in per chain: 2000 
thinning: 5 
time: 42 sec

alexvolkmann commented 2 years ago

Yes, true, the development version does not return an error. Thanks!

drizopoulos / JMbayes2

Suggestion for input check #25