CDCgov / cfa-epinow2-pipeline

https://cdcgov.github.io/cfa-epinow2-pipeline/
Apache License 2.0
10 stars 2 forks source link

Extensibility case study: add random walk #37

Open athowes opened 3 weeks ago

athowes commented 3 weeks ago

Based on F2F conversation with @seabbs and @zsusswein:

I think it's good to discuss how to extend the infrastructure we have here to include other features. I think it's likely that we will want to fit more than one specific type of EpiNow2 model, so this is a good thing to think about in advance.

As an example, suppose we would like to run a version of EpiNow2 with a latent random walk rather than a latent GP, how might we do that?

An option we discussed would be to extend fit_model to be an S3 generic, with methods for fit_model.gp and fit_model.rw. A concern I have about this design is that we may have challenges when there are multiple ways we might want to alter the model. For example, say we have a few choices like this i.e.:

  1. GP or RW
  2. Option A or Option B

Then how would we extend this S3 framework to deal with the 2x2 possible set-ups?

In epidist we have been designing to allow users to pass through options to brms as much as possible to maintain flexibility. This has been useful in situations like "oh I want to try using Pathfinder, all I need to do is put algorithm = "pathfinder" and it just works! I think epidist and cfa-epinow2-pipeline are intended to do quite different things / sit in quite different places, so it makes sense to me that they wouldn't take the same approach, but just as an example to highlight what we might be missing out on.

I think a lot of this stems from reliance on the configuration file. In part this is a design choice, but I think worth highlighting some of it's downsides.

In general I think It could be helpful to write out the steps to "add a new feature" to the model (like this RW one) so that we have an idea of how much future work we are creating for ourselves with this design. I do think it's inevitable that we will want different specifications for the model, so it's important we don't make it too prohibitively hard to make those changes IMO.

(This issue was typed out relatively quick, so let me know if I can tidy it up a bit / clarify.)

CC: @kgostic @natemcintosh

athowes commented 3 weeks ago

I was going to make another issue, but "Extensibility case study 2: change any of the _opts that we don't currently touch".

The epinow function has modular options like:

  1. delay_opts()
  2. rt_opts()
  3. gp_opts()
  4. stan_opts()

This is starting to be more of a design suggestion than a case study, but if we were to move specification of these options out of fit_model and into their own functions, then pass those into fit_model I think we could probably retain the config driven workflow and much more flexibility about specifying the EpiNow2 model.

In that case we could have helpers like:

  1. delay_opts_cfa(config)
  2. rt_opts_cfa(config)
  3. gp_opts_cfa(config)
  4. stan_opts_cfa(config)

The results of these get passed into fit_model. We can put our opinionated preferences into these helpers, while retaining the ability to change things by giving different options to a greater extent. This also means we don't have the issue of being unable to change things in modules like we have currently (the 2x2 example I think is designed to illustrate that).

I can flesh this out more if it's helpful.

seabbs commented 3 weeks ago

There is also a choice here to either do.

  1. Specific args set via config

    some_opts(
    arg1 = config[["some_opts"]][["arg1"]],
    arg 2 = config[["some_opts']][["arg2"]]
    )
  2. Just set what you want that is different to the defaults.

do.call(
  some_opts, config[["some_opts]]
)

The advantage of the former it is very declarative and requires all options to always be specified in the top level config. The advantage of the latter is it lets you extend your config settings without changing the underling hard coding or making experiments need to change the production fit_model configs (as all new options will require global level config changes) Safetfy wise I think the latter will error if it is given an arg that isn't used at all but not if the target has ... support. If you wanted you could write some safety around this to check 1. that config opts are in the args of the target (using formals) and specify some options that must be their for a given opts.

This could look like

opts_dispatch <- function(config, fn = some_opts, required = c("arg1", "arg2")) {
     #  filter configs with function name.
     # arg check config has hard coded settings
     # arg check that all config settings are in formals() of some_opts
    do.call(some_opts, some_opts_config)
}

Also one of the design choices that EpiNow2 makes that interacts with your current implementation to make things hard is that you can only turn off the gp by setting the arg to NULL but not by passing anything to gp_opts. There is an issue to address this is you want to ping that and get it updated for the next release

zsusswein commented 3 weeks ago

I think it's likely that we will want to fit more than one specific type of EpiNow2 model, so this is a good thing to think about in advance.

As an example, suppose we would like to run a version of EpiNow2 with a latent random walk rather than a latent GP, how might we do that?

Conveniently @kgostic and I were just talking about implementing a random walk yesterday! Excellent case study.

Before diving into details, I want to preface that I agree with you both that the biggest drawback to this design is extensibility. It takes away the ease of mixing/matching combinations that vanilla EpiNow2's design offers. I do think it offers other benefits though (namely, careful monitoring of performance and use with Azure Batch) and I want to focus on weighing that tradeoff.

An option we discussed would be to extend fit_model to be an S3 generic, with methods for fit_model.gp and fit_model.rw

Yes, that would be my preferred approach. There's a demo of how I think we should that (and perhaps Sam Approved™) here.

Then how would we extend this S3 framework to deal with the 2x2 possible set-ups?

The general answer to this question is double dispatch (which is fiddly and I don't want to do). But I think that's an incomplete answer. Because I agree that

delay_opts_cfa(config) ... We can put our opinionated preferences into these helpers, while retaining the ability to change things by giving different options to a greater extent.

is the right answer for the specific situations you laid out and would allow us to mix and match through combinations of single dispatch. So, I think this is hard to answer in the abstract but in general I think we're in agreement.

The tradeoff is that we lose the ability to point people to a clean wrapper around an EpiNow2 call. That was particularly a desired feature from @kgostic; pointing interested parties to a legible EpiNow2 spec. With this kind of change, it'll be tricky to keep that legibility I think (but perhaps I'm lacking imagination!).

The random walk example is helpful here because I think we could think through the call as something like our existing EpiNow2 call:

https://github.com/CDCgov/cfa-epinow2-pipeline/blob/c04c8f857e49530ffad946ea582f404a42098735/R/fit_model.R#L18-L24

But it would require a fair amount of copy/paste to deal with basically this one section:

https://github.com/CDCgov/cfa-epinow2-pipeline/blob/c04c8f857e49530ffad946ea582f404a42098735/R/fit_model.R#L25-L34

That's......not ideal. Alternatively, we could give up the legibility of the EpiNow2 call. I'm willing to be convinced that's a good idea to reduce code duplication, but I think we should get to a nice vignette (#38), see what the necessary components are there, and then make the code more complex from there.


The bigger question is how we pass our desired spec into the pipeline. I think an answer like this is useful and a pretty small tweak but I think we should weigh the pros/cons:

do.call( some_opts_func, config[["some_opts]] )

It would make testing and one-off edits dramatically easier. Perhaps that's worth it on its face! (I think it might be)

The downside is that it would allow us to run basically any combination of EpiNow2 features, whether or not we've tested/validated the combination. I'd feel more comfortable with that choice if/when we had multiple models running and could compare, but at the moment we don't have that safety net. Perhaps we should design for the future anyway?

In either case, I think these are pretty small tweaks and I think I'm vaguely in favor of them. My impression earlier in the call is that there were proposals for much larger changes in mind...


The thing I want to emphasize is that I don't think we want to be doing lots of prototyping with EpiNow2 here. I want this pipeline to have a clean, robust spec and then leave it alone. I'd like to run it daily on a schedule and go work on other models. Because that's my goal, my design bias is toward simplicity rather than easy prototyping for this particular pipeline.