Support Hierarchical/partially pooled models

seabbs commented 1 month ago

          Implementing this has also made me thing about hierarchical priors. Currently the only way I can think we might do this is similar to this stacking approach but passing in uninstantiated structs and a prior model function and then in the method instantiating the structs using the prior model.

This feels a bit clunky? As far as I am aware we don't have an issue for this so shall we split out and discuss? A good test case for this feels like it might be trying to fit to counts and deaths with a partially pooled overdispersion parameter. Another sensible toy could be trying to fit to cases and deaths with both having 3 age groups (and pooling by dataset and age group both nested and independently).

Another option might be allowing structs to be partially instantiated but that feels very bug prone

Originally posted by @seabbs in https://github.com/CDCgov/Rt-without-renewal/issues/249#issuecomment-2146096487

See above comment.

The aim is to support arbitrarily partially pooled models. Ideally we can do this using our current struct based dispatch but we may also want to programme over the generated turning model.

As a first step we should throw out some proposed designs for the target examples given in the comments.

SamuelBrand1 commented 1 month ago

OK quick thoughts back here.

The main difference between any model defining struct we have at the moment, and what we would want, is that at the moment the prior information is given to the struct whereas for the hierarchical version we'd only want the Distribution type rather than a specific instance of a Distribution.
Partial pooling usually (always?) implies having a hyperprior for the location and scale/dispersion parameters. Are we going to focus on that or are we including multi-level modelling in scope here which can include other ideas?
It might be possible to avoid having to declare a shadow set of "pool-able" versions of each type.

SamuelBrand1 commented 1 month ago

As of #249 we have the concept of stacked observation models. So a sketch idea would be to have

A hierarchical type that ingests a stacked set of models and has the hyperpriors for location and dispersion/scale
A method for generate_observations which dispatches on receiving a loc and scale as positional args

Some vague pseudo-code

obs = StackObservationModels(...)
hier = Hierarchy(obs; loc_prior = some_dist, scale_prior = other_dist)

EpiAwareBase.generate_observations(obs, data, loc, scale)
  params ~ function_that_returns_a_prior_based_on_type_loc_scale(obs, loc, scale)
 ....
end

@model function EpiAwareBase.generate_observations(hier::Hierarchy,..)
  loc ~ some_dist
  scale ~ other_dist
  @submodel observations prefix="something" generate_observations(obs, data, loc, scale)
...
end

seabbs commented 1 month ago

Partial pooling usually (always?) implies having a hyperprior for the location and scale/dispersion parameters. Are we going to focus on that or are we including multi-level modelling in scope here which can include other ideas?

Ideally the more general case but the less general case would obviously be good to solve and if the general case requires a difficult interface etc it could make sense to solve differently.

It might be possible to avoid having to declare a shadow set of "pool-able" versions of each type.

Yes we really really want to avoid this if we can or at least automate it

seabbs commented 1 month ago

obs = StackObservationModels(...)
hier = Hierarchy(obs; loc_prior = some_dist, scale_prior = other_dist)

EpiAwareBase.generate_observations(obs, data, loc, scale)
  params ~ function_that_returns_a_prior_based_on_type_loc_scale(obs, loc, scale)
 ....
end

@model function EpiAwareBase.generate_observations(hier::Hierarchy,..)
  loc ~ some_dist
  scale ~ other_dist
  @submodel observations prefix="something" generate_observations(obs, data, loc, scale)
...
end

The problem I see here is telling obs that it should use the hierarchy vs what it has as default prior from the various submodels. As you are saying doing that without new shadow structs seems hard unless we make Hierarchy take non-instantiated structs and instantiated them with the generate_observations call (and then pass this down to StackObservationModels etc

For simple problems this seems fine...

obs = Hierarchy(NegativeBinomial, ...)

generate_observations(h::Hierarchy, ..)
  disp_mean ~ 
  disp_sd ~ 
  disp_raw ~ 

  NB = h.model(disp = disp_mean + disp_sd * disp_raw)
  Stacked = StackObservationModel(NB...)
  @submodel generate_observations(Stacked...)

but obviously is it ideal to be passing these different kind of structs around, even here it would be hard to do non-centred parallelization and what happens when you have a complicated obs model (i.e with latent delays that differ).

SamuelBrand1 commented 1 month ago

That pattern would work generatively but you be asking the grad call to go through struct construction (alot)... I'm not sure how that would work out.

Edit: effectively we do that a lot with Distributions.jl so this could be fine

SamuelBrand1 commented 1 month ago

My thinking was that the only place a shadow version of the struct and the self-contained version differ is on the instantiated value of the prior; but the type-information of the prior is useful can we can access that.

E.g. an observation struct with a Normal type prior; the shadow implementation gets that its a Normal

SamuelBrand1 commented 1 month ago

Ideally the more general case but the less general case would obviously be good to solve and if the general case requires a difficult interface etc it could make sense to solve differently.

If we centre on the question of "how do we create a prior for sampling one chunk of model, given passed variables of other level?" then I think we can generalise

seabbs commented 4 weeks ago

One option is to use Accessors.jl (https://juliaobjects.github.io/Accessors.jl/stable/) to update deeply nested structs.

The upside of this would be no shadow structs and support for nesting. The downside is that you would need to define priors that would then not be used.

Thinking more about this problem more generally it seems like if we work out what we want I.D.D models to look like (i.e how we compose them together) and then tackle this it will be easier to make good choices.

This means. Ithink we should deprioritise in favour of new issues targeting metadata, constructing models based on metadata, composing independent infection processes etc.

CDCgov / Rt-without-renewal

Support Hierarchical/partially pooled models #255

Some vague pseudo-code