drizopoulos / JMbayes2

Extended Joint Models for Longitudinal and Survival Data
https://drizopoulos.github.io/JMbayes2/
81 stars 23 forks source link

Strange results with piecewise constant baseline hazard #24

Closed essisyrjala closed 2 years ago

essisyrjala commented 2 years ago

Dear Professor Rizopoulos,

I have been using a lot the joint modeling packages developed by you, firstly the JM package, and currently the JMbayes2 package. I like to thank you for the important work you have done in this research area.

However, I faced a problem with the JMbayes2 package, due to baseline hazard estimation. I have fitted the joint models with baseline hazard set as piecewise constant with 2 knots (3 hazard segments; implemented by adding “control=list(Bsplines_degree=0, base_hazard_segments=3))”). Results have been plausible as long as I used the one particular endpoint, but when I started to fit models with a different endpoint, all the results (dependently on the exposure variable) became very strange.

To be clear, I am investigating the association between different longitudinally measured nutritional variables and the risk of type 1 diabetes. This problematic endpoint (type 1 diabetes, T1D) compared to the first one I used with plausible behavior (islet autoimmunity, IA) has

For example, the obtained hazard ratios (95% CI) for energy intake are 23.2 (19.0, 28.2) with piecewise constant, 2.18 (1.71, 2.78) with piecewise linear, and 1.46 (1.04, 2.02) with piecewise quadratic baseline hazard with 3 hazard segments. The last one agrees with the result obtained with jointModel function from the JM-package, where piecewise constant baseline hazard with knots at 1.75 and 3.75 years (in 5.75-year follow-up) was used.

However, I noticed that JMbayes2 may fit the piecewise constant baseline hazard between the first and the last censored or observed endpoint (by ignoring the time before the first endpoint?), since the boundary knots of the baseline hazard seems to be placed at 0.79 and 5.75 years for type 1 diabetes outcome. Here are the outputs relating to the baseline hazards of piecewise constant, piecewise linear and piecewise quadratic models (notice that the follow-up starts from time point 0):

`

1. Piecewise constant baseline hazard (with 2 knots)

summary(jmod_diab_ENERC_total_MJ_cons3)$control$knots$1 [1] 0.7876454 2.4417636 4.0958818 5.7500000 jmod_diab_ENERC_total_MJ_cons3$statistics$Mean$bs_gammas bs_gammas_1 bs_gammas_2 bs_gammas_3 -9.439852 -11.944761 -14.200867

2. Piecewise linear baseline hazard (with 2 knots)

summary(jmod_diab_ENERC_total_MJ_lin3)$control$knots$1 [1] -0.8664727 0.7876454 2.4417636 4.0958818 5.7500000 7.4041182 jmod_diab_ENERC_total_MJ_lin3$statistics$Mean$bs_gammas bs_gammas_1 bs_gammas_2 bs_gammas_3 bs_gammas_4 -6.573006 -6.478030 -6.723222 -7.118328

3. Piecewise quadratic baseline hazard (with 2 knots)

summary(jmod_diab_ENERC_total_MJ_quadr3)$control$knots$1 [1] -2.5205909 -0.8664727 0.7876454 2.4417636 4.0958818 5.7500000 7.4041182 9.0582364 jmod_diab_ENERC_total_MJ_quadr3$statistics$Mean$bs_gammas bs_gammas_1 bs_gammas_2 bs_gammas_3 bs_gammas_4 bs_gammas_5 -6.670896 -6.299837 -6.161903 -6.251574 -6.398656 `

But if I manually change the time of the endpoint of one child (censored or observed, this was done for the child having exposure data only from the first measurement at time 0, so I did not modify the exposure data) to time 0.01, I obtain a more rational result of HR: 1.40 (95% CI: 1.01, 1.88), and the same baseline hazard output for piecewise-constant hazard becomes as:

`

summary(jmod_diab_ENERC_total_MJ_cons3_test1)$control$knots$1 [1] 0.010000 1.923333 3.836667 5.750000 jmod_diab_ENERC_total_MJ_cons3_test1$statistics$Mean$bs_gammas bs_gammas_1 bs_gammas_2 bs_gammas_3 -6.409399 -6.210152 -6.125433

`

So, my question is: Is the baseline hazard estimation going wrong with piecewise-constant hazard? Is it ignoring the time before the first observed or censored endpoint in estimation?

Thank you in advance for the help!

With best regards, Essi Syrjälä

drizopoulos commented 2 years ago

Dear Essi,

Without seeing your data, I guess that this is caused by the fact that the knots are placed at equidistant points in jm(), which is a bit different from what was done in jointModel(). Perhaps, you could try setting the knots yourself.

essisyrjala commented 2 years ago

Dear Dimitris,

Thank you for your rapid answer.

Yes, it's true that knots are not at the same places. Actually, I haven't found a way to set them by myself in jm(), like in jointModel() function. The similar 'knots=c(1.75, 3.75)' gives a warning message. How can this be done in jm()?

However, I'm still confused since the knots are placed at equidistant points between the first and the last endpoint, not between the start and the end of the follow-up period. Because for me, this output with control=list(Bsplines_degree=0, base_hazard_segments=3)) seems like the first boundary knot would be placed at 0.7876454 years, although the follow-up (and exposure data) starts from 0:

summary(jmod_diab_ENERC_total_MJ_cons3)$control$knots$1 [1] 0.7876454 2.4417636 4.0958818 5.7500000

But if I manipulate the data by changing one endpoint to 0.01 years, the first boundary knot shifts to this time point, and knots are placed at equidistant points between 0.01 and 5.75 years:

summary(jmod_diab_ENERC_total_MJ_cons3_test1)$control$knots$1 [1] 0.010000 1.923333 3.836667 5.750000

And after this manipulation HR: 23.2 (19.0, 28.2) becomes as HR: 1.40 (95% CI: 1.01, 1.88). I know that the knots are placed at the different points in these cases, but I don't fully understand, how the follow-up time before the time 0.79 is taken into account in modeling?

With best regards, Essi

drizopoulos commented 2 years ago

You're correct; I had forgotten to put a knots control argument. It is now there, but you will need to install the development version from GitHub.

Also, indeed, the problem is caused by the manner JMbayes2:::knots creates the knots. In particular, when the degree is zero, it starts from the minimum event time, whereas it would be better to start from zero. I made an update for this case - could you run your code again?

essisyrjala commented 2 years ago

Thank you for your reply.

I installed the development version from GitHub and tried to fit a model with specific knot places, but now I got a warning message of Error in rep.int(nk, max(0L, ord - match(TRUE, dkn > 0))) : invalid 'times' value. And if I try with linear or quadratic form of baseline hazard, the obtained warning message is Error in FUN(X[[i]], ...) : 'ord' must be positive integer, at most the number of knots.

drizopoulos commented 2 years ago

Could you install it and try again? I made another update on how the knots are computed when the degree is zero.

essisyrjala commented 2 years ago

Thank you, I installed it again. And sorry for the delay in my answer.

Now I see, that the first boundary knot is placed at the beginning of the follow-up period (at 0) with control=list(Bsplines_degree=0,base_hazard_segments=3) option. But for some reason, something strange happens in the end of the period, since knot-output of the baseline hazard is now:

> summary(jmod_diab_ENERC_total_MJ_cons3)$control$knots$1 [1] 0.000000 1.654118 3.308236 4.962355

although my last observed case is at 5.73 years and censored observations are at 5.75 years. It seems that the whole time period is shifted left (towards beginning) by 0.79 years (as my first boundary knot was before at 0.79 years)?

In addition, I still could not make the user-specified knots working, with control=list(Bsplines_degree=0, knots=c(1.75,3.75)) or similarly with control=list(Bsplines_degree=0,base_hazard_segments=3,knots=c(1.75,3.75)) the function gives an error message of Error in rep.int(nk, max(0L, ord - match(TRUE, dkn > 0))) : invalid 'times' value.

drizopoulos commented 2 years ago

I made another small change - could you try again?

essisyrjala commented 2 years ago

Thank you!

Unfortunately, I still get the same result as previously, with knots placing incorrectly at [1] 0.000000 1.654118 3.308236 4.962355.

And the same error occurs when trying to set knots for piecewise-constant hazard by myself: Error in rep.int(nk, max(0L, ord - match(TRUE, dkn > 0))) : invalid 'times' value. If trying to set the knots for piecewise-quadratic baseline hazard, error becomes as Error in FUN(X[[i]], ...) : 'ord' must be positive integer, at most the number of knots.

drizopoulos commented 2 years ago

This is strange! Can you check if JMbayes2:::knots in your installation is the same as what you see here.

Also, sorry that I forgot to say that the control argument knots should be a list.

essisyrjala commented 2 years ago

Yes, it appears to be the same in my installation:

`> JMbayes2:::knots function (xl, xr, ndx, deg) { dx <- (xr - xl)/ndx start <- if (deg < 1) 0 else xl - deg dx end <- if (deg < 1) xr + 0.001 else xr + deg dx seq(start, end, by = dx) } <bytecode: 0x000001d559355bf0>

` In addition, I couldn't get `knots` working either by putting it as a list: `control=list(Bsplines_degree=0,base_hazard_segments=3,knots=list(1.74,3.74))`.
drizopoulos commented 2 years ago

It should be something like jm(..., knots = list(c(0, 1.65, 3.31, 4.96, 7.05))).

drizopoulos commented 2 years ago

When I try this example with the new version,

MixedFit <- lme(log(serBilir) ~ year, data = pbc2, random = ~ year | id)
pbc2.id$Time <- pbc2.id$years
pbc2.id$event <- as.numeric(pbc2.id$status != "alive")
CoxFit <- coxph(Surv(Time, event) ~ drug + age, data = pbc2.id)
JMFit1 <- jm(CoxFit, MixedFit, time_var = "year", Bsplines_degree = 0L, 
             base_hazard_segments = 5)

JMFit1$control$knots

I get

> JMFit1$control$knots
$`1`
[1]  0.000000  2.838681  5.677363  8.516044 11.354726 14.193407

which seems OK to me.

essisyrjala commented 2 years ago

Okay, I see. Now I get knots working! So now, compared to the jointModel function, in jm knot places should be given as a list, and also boundary knots should be given. Great, thank you!

However, considering your above example, the distribution for the event times is:

> summary(pbc2.id$years)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1123  3.7085  6.2959  6.4112  8.8798 14.3057  

and your last boundary knot is at 14.1934 years, instead of 14.3057 years (maximum value). The difference between those numbers is 0.1123, for which reason I suspect, that the whole time period for baseline hazard is shifting left (towards the beginning), not just the place for the first boundary knot. And since the minimum value of my event times is 0.79, instead of only 0.1123, the effect is more dramatic compared to the example with pbc dataset.

drizopoulos commented 2 years ago

The new version should work.

essisyrjala commented 2 years ago

Yes, it works! Perfect, thank you so much.

My one last further question (this is the last one, I promise): have you considered to allow the use of interval censored data with jm function (or is it possible already)?

drizopoulos commented 2 years ago

Interval censoring is already implemented in the package, though we have not tested it thoroughly yet.