Genentech / jmpost

https://genentech.github.io/jmpost/
17 stars 4 forks source link

Add support for generating quantities for the population level #296

Closed gowerc closed 6 months ago

gowerc commented 7 months ago

Currently we only support generated quantities for individual subjects who exist within the dataset. Francois has asked that we also have functionality to produce estimates for the population level (e.g. a hypothetical patient with 0 random effects).

gowerc commented 7 months ago

Actually @danielinteractive / @mercifr1 - Apologies I'm not entirely understanding what the expectation is here.

Take for example the phi parameter current GSF model. Currently this is formulated as:

patient_phi ~ beta(population_a,  population_b)

In this case what would you expect the "Population Phi" value to be? My naive guess would be to just take the mean of the parent distribution e.g. population_a / (population_a + population_b) but just wanted to double check that this is what you were expecting in this instance.

For reference the same issue also comes up for the other parameters if the user uses the "centered" parameterisation where the individual patient parameters are sampled as:

    lm_gsf_psi_bsld ~ lognormal(lm_gsf_mu_bsld[pt_study_index], lm_gsf_omega_bsld);
    lm_gsf_psi_ks ~ lognormal(lm_gsf_mu_ks[pt_arm_index], lm_gsf_omega_ks);
    lm_gsf_psi_kg ~ lognormal(lm_gsf_mu_kg[pt_arm_index], lm_gsf_omega_kg);
gowerc commented 7 months ago

@mercifr1 / @danielinteractive - Sorry also what does this mean in the case of the survival distribution where the user is allowed to use arbitrary covariates when fitting the model? How should the design matrix for the hypothetical "population patient" be constructed ?

danielinteractive commented 7 months ago

Thanks @gowerc , yeah agree we have to specify this in detail first before proceeding to the implementation. A general question here will be how much and what should be easily available by options/methods and what the user need to do themselves based on samples of all model parameters.

mercifr1 commented 7 months ago

Hi @gowerc The phi parameter is indeed a special case, as - unlike with the other parameters - it is not parameterized hierarchically in the current jmpost implementation of GSF. However, you could probably parameterize it hierarchically, and it would solve this issue. For instance, you could consider:

parameters{
    real mean_mu_phi;
    ...
    real<lower=0, upper=3> sd_mu_phi;
    ...
    vector<lower=0, upper=1>[n_arms] mu_phi;
    ...
    real<lower=0> omega_phi;
    ...
    vector[Nind] eta_tilde_phi;
    ...
}

transformed parameters{
    ...
    psi_phi=inv_logit(logit(mu_phi[arm_index])+eta_tilde_phi*omega_phi);
    ...
}

The hierarchical parameterization of the 'biomarker longitudinal submodel' gives direct access to the typical value of the parameter estimates, in the considered sample (ie. arm or study). In the nomenclature used in jmpost, it means that the mu_param is the typical value of the parameter estimate. There is no hierarchical structure in the survival models implemented in jmpost. So, all we produce for this submodel are population-level or typical-value estimates (with or without covariate, it does not change this statement).

Hope this helps.

gowerc commented 7 months ago

@mercifr1,

Sorry just to be clear we currently do have a hierarchical parameterisation on the Phi parameter, its just our current implementation uses the "centered" parameterisation whereas what you've shown above is the "non-centred" parameterisation.

This is not limited to just the Phi parameter though, we currently support both "centred" and "non-centred" for all the parameters. As an example in the current GSF model we have currently have the following for the "k" parameter:

If the user selects "centred" parameterisation:

lm_gsf_psi_ks ~ lognormal(lm_gsf_mu_ks[pt_arm_index], lm_gsf_omega_ks);

If the user selects "non-centred":

lm_gsf_psi_ks  = exp( lm_gsf_mu_ks[pt_arm_index] + (lm_gsf_eta_tilde_ks * lm_gsf_omega_ks) )

My understanding though from the Stan documentation is that these are 100% equivalent. The purpose of the "non-centred" parameterisation is just to be more computational stable when dealing with low numbers of observations per patient whereas the "centered" parameterisation is more stable when dealing with larger numbers of observations per patient.

Unfortunately I'm not sure how to unify these two cases in terms of sampling from the population as simply setting lm_gsf_eta_tilde_ks=0 is not equivalent to taking the mean of the distribution which for the log-normal would be:

exp (lm_gsf_mu_ks[pt_arm_index] +  lm_gsf_omega_ks / 2)

With regards to the phi parameter in particular I'm a bit hesitant to convert it to the non-centred parameterisation because there is no equivalent parameteric distribution to use in the centred case thus switching between the 2 would result in different distributions being used which doesn't seem right to me.

gowerc commented 7 months ago

@mercifr1

Just writing up what we spoke about in our chat this morning:

For the GSF model for parameters that support the non-centered parameterisation we will just use the corresponding $\mu$ value from the LogNormal distribution e.g. lm_gsf_mu_ks in the case of the shrinkage parameter.

For distributions that don't support the non-centered parameterisation (e.g. the phi parameter) we will just take the mean on the distribution.

Sorry @mercifr1 - Just to be clear though is this only required for the longitudinal model ? As in supporting this in the survival model would be more challenging as we get into the realms of emmeans as we'd need to define the covariates for the design matrix of a hypothetical patient.

gowerc commented 7 months ago

Regarding the population survival quantities I think this was on pause for now for @mercifr1 to consider it some more due to the complications of how to actually meaningfully calculate it due to the need of TGI parameters as well as covariate specification for a "hypothetical patient".