Closed anddis closed 5 months ago
I think this should now be fixed in https://github.com/chjackson/flexsurv/commit/577b54c7aaf1ed42947f266ae0e420ce2d979574 - see here for how to install the development version. I've never used non-default contrasts (never seen a need for them in practice?) so have never noticed this behaviour. Thanks for the report and the concise test case.
Thank you for the quick fix!
For your information: predict.flexsurvreg
ignores the contrast matrix. Parameter estimates are correct now but predictions are not.
> predict(fit.noc, type = "hazard", time = 1) |> unique()
# A tibble: 4 × 2
.eval_time .pred_hazard
<dbl> <dbl>
1 1 0.00442
2 1 0.0131
3 1 0.0150
4 1 0.00580
> predict(fit.c, type = "hazard", time = 1) |> unique()
# A tibble: 4 × 2
.eval_time .pred_hazard
<dbl> <dbl>
1 1 0.00843
2 1 0.0249
3 1 0.00967
4 1 0.00325
Warning message:
contrasts dropped from factor celltype2
Does https://github.com/chjackson/flexsurv/commit/640a58e13ad7422196a6ee9188e54228733a205a fix this issue for you (models will have to be refitted)? As I understand it, since changing the contrasts just results in a reparameterisation of the same model, the predictions should be the same (assuming that the optimizer identifies the correct MLE in each case)?
Yes, it does fix the prediction issue. Thank you very much! And yes again, the predictions should be the same for the very reason you mentioned.
OT: your remark about the optimizer finding the correct MLEs is interesting. I have run some simple piecewise constant hazard models and noticed that when using non-default contrast matrices the predicted hazards won't always replicate their empirical counterparts. I never noticed this behaviour with the default contrast coding. See below for a reproducible example.
For me, you can close this issue as solved. Cheers!
library(survival)
library(flexsurv)
library(MASS)
cancer$status <- cancer$status - 1
cancer.long <- survSplit(Surv(cancer$time, cancer$status) ~ .,
cancer,
cut = c(150, 400),
episode = "timeband",
event = "status")
cancer.long$timeband <- factor(cancer.long$timeband)
fit.freg <- flexsurvreg(Surv(tstart, tstop, status) ~ timeband,
data = cancer.long,
dist = "exp")
pred.fit.freg <- unique(predict(fit.freg, type = "hazard", time = 1)$.pred_hazard)
cancer.long$timebandc <- cancer.long$timeband
contrasts(cancer.long$timebandc) <- codingMatrices::contr.diff(3)
fit.freg.diff <- flexsurvreg(Surv(tstart, tstop, status) ~ timebandc,
data = cancer.long,
dist = "exp")
pred.fit.freg.diff <- unique(predict(fit.freg.diff, type = "hazard", time = 1)$.pred_hazard)
ir <- tapply(cancer.long$status, cancer.long$timeband, sum) /
tapply(cancer.long$tstop-cancer.long$tstart, cancer.long$timeband, sum)
cbind(ir = ir,
pred.fit.freg = pred.fit.freg,
pred.fit.freg.diff = pred.fit.freg.diff)
ir pred.fit.freg pred.fit.freg.diff
1 0.001530048 0.001530049 0.001531380
2 0.002918468 0.002918466 0.002919967
3 0.003303405 0.003303404 0.003302016
Hello Chris,
flexsurvreg
seems to ignore the contrast matrix associated with a factor.In the reproducible toy example at the end of this message, the object
fit.c
ignored the contrast matrix associated with thecelltype2
factor variable. In fact, the estimates were similar (but not identical!) to those infit.noc
, where no contrast matrix was associated to the factor specified on the rhs of the formula passed toflexsurvreg
(celltype
).As a benchmark, I used the
glm
function, which correctly handle contrast matrices, to fit an equivalent Poisson model (fit.poi
).Is there anything I should be aware of to use contrast matrices with
flexsurvreg
?Thank you for this great package!