chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
53 stars 28 forks source link

Splines not supported by standsurv without specifying newdata #167

Open markdanese opened 11 months ago

markdanese commented 11 months ago

I really appreciate this package. It makes things much easier, particularly with regard to generating causal contrasts and getting reasonable variance estimates.

I ran into an issue trying to get standsurv() to work when using a natural spline from the splines package. In this case it was age as a predictor in a model of time to death (in lung cancer). When age was used as a simple continuous variable, standsurv() worked fine without needing to specify the data set. When I changing to a natural spline (ns(age, 2)) to handle some non-linearity in increasing risk with age, I got the error that it could not find the variable age. Helpfully, the error message suggested I should specify "newdata".

I noted that the model object includes the transformed age (i.e., in this case with 2 spline terms), so the error makes sense -- age is not there. And when I specified newdata = the original dataset, it worked without an error.

I am guessing that the predict function (which I think is part of summary()) isn't built for this use case. I tried to see how to work around this and suggest a code change, but I couldn't find anything helpful.

The simple workaround is to explicitly specify the original dataset, so it is not a critical issue. However, I wanted to put this out there in case anyone runs into this.

chjackson commented 11 months ago

Thanks for the report. The default newdata that the flexsurvreg predict method uses is the "model frame" that is created in this line of flexsurvreg.R. When run with a ns() formula, this line seems to put the basis variables into the model frame, rather than the original covariate values that we want. I haven't used ns and the like, so I can't see a quick fix. I will leave this open.

chjackson commented 7 months ago

This is proving tricky to handle. The function stats::get_all_vars seems like it would be useful here, as it is designed to extract the original variables supplied to a formula, whereas stats::model.frame extracts the transformed versions. However get_all_vars fails in cases where the formula contains a data frame look-up, e.g. compare

get_all_vars(ovarian$futime ~ 1, data=NULL) # fails
model.frame(ovarian$futime ~ 1, data=NULL) # works