Understand how to predict new subjects in a mixed effect model

danielinteractive commented 2 years ago

Background: What if we have a new patient in the test set that has a new random effect e.g. SLD parameters (e.g. ks)?

To do:

[ ] think about methods side
- [ ] if we have a much simpler linear mixed model e.g. fitted with lme4 package in R - how do the predictions work for new subjects?
- [ ] e.g. random intercept and random slope for a continuous covariate -> how does the prediction work?
[ ] Consider the covariates that are modelled with random effects (corresponding to our SLD) as something we provide / have, then predict from that

gowerc commented 2 years ago

@danielinteractive , Apologies if I'm mistaken but I thought the general practice of prediction with random effects was to just set the random effects to 0? At least this makes intuiative sense to me as they are essentially nuisance parameters that are unique to the subject that you have no prior information on so the best you can do is assume the subject lies in the centre of the distribution (i.e. 0).

I would have thought the only case where we wouldn't set to 0 is if we wanted to do simulation from the model (i.e. some form of say parametric bootstrapping).

danielinteractive commented 2 years ago

Thanks @gowerc for looking at this - yeah generally I think the same. We just need to understand how it works here. Since here SLD is a covariate for the OS prediction. And I guess we need to somehow fit the SLD curve to the SLD observations of this individual to obtain meaningful OS predictions.

gowerc commented 2 years ago

I feel like I'm potentially not understanding this properly :) Wouldn't you just apply both models i.e. take the model coeificients and use that to predict the patients SLD values based on their baseline covariates. Then you can use the SLD values to predict the patients OS hazard and then convert that into S(t) = exp (-H(t)) ?

I guess one challenge would be extracting the final OS hazard model from the stan code to calculate the hazard values. I think rstan has functions for exporting stan functions into R but I'm not sure if we have the same for cmdstan, perhaps we can use manipulate the stan code to create a stand alone stan program that simply takes OS inputs and returns the hazard values from day 0 - 1000 ?

danielinteractive commented 2 years ago

No we don't to predict SLD values for patients. The application here is that we observe SLD values and want to predict OS values. SLD is not part of the outcomes in that sense.

gowerc commented 2 years ago

O I see, sorry I thought we meant new patient as in we had no information about them other than baseline covariates. I guess if they are a brand new patient but have SLD data then we would need a way of re-fitting the model to them keeping the parameters & Hyperparameters fixed to just estimate their individual random effects for the kinetics model. I wonder if this is even possible..

danielinteractive commented 2 years ago

Yeah exactly something like that

gowerc commented 1 year ago

This is a methods question, potentially need to involve MCO / Francois

gowerc commented 9 months ago

Need to clarify what high level steps are required to answer actual questions from trials application. E.g. how is it intended for the model to be used in practice e.g. do people need to be able to pass in longitudinal data into an already fit OS model? Need to clarify with @mercifr1 what the intended applications of the model for prediction are.

Potentially need to add a workflow vignette to clarify the desired application. e.g. if you are a new statistician using this package how should they be expcted to use. E.g. what problem are we solving and how to use the package to solve that problem

gowerc commented 7 months ago

@danielinteractive , Talking with @mercifr1 about this yesterday we were discussing making this more general so that its up to the end user to decide what they put through the model (this is also kinda linked to #296 ).

That is for predicting OS the user would specify the baseline covariates and the TGI parameter values that they want to use to make the predictions. They are then free to set the TGI parameter values either to the population medians or to a arbitrary value if wanting to predict a hypothetical patient.

Perhaps then an interface could be something like:

predict(
    model,
    new.data = data.frame(sex = "F", age = 30, ECOG = "3" ),
    link.data = data.frame("s" = 0.6, "g" = 0.3, "phi" = 0.4, "b" = 60)
)

danielinteractive commented 7 months ago

Yeah this could work nicely I think

gowerc commented 6 months ago

This is more just an FYI...

Just to say it appears lme4 provides no functionality for manually setting the random effects values; looks like you can only predict based on (a) "the population level" e.g. setting all random effects to 0 or (b) setting the random effects to the values of a specific patient. Their api for (b) is:

predict(
    mod,
    newdata = tibble(
        age = 0.5,
        pt = "pt_000003"
    ),
    re.form = ~ (1 + age | pt)
)

In particular the re.form argument allows you to specify which random effects you want to assume, for example setting re.form = ~ (0 + age | pt) would keep the random effect for age but set the random intercept to 0.

From what I've been reading there is some theory and supporting packages for bayesian extensions that, assuming you have some observed data for a new subject, can be used to calculate the posterior distribution for the subjects individual random effects given their observed values. I've not looked that deeply into this though.

gowerc commented 6 months ago

With #313 being merged I am going to push this onto the backlog as we have enough of a basic feature set even if the full feature isn't available.

Genentech / jmpost

Understand how to predict new subjects in a mixed effect model #12