CDCgov / cfa-epinow2-pipeline

https://cdcgov.github.io/cfa-epinow2-pipeline/
Apache License 2.0
10 stars 2 forks source link

Create `cfa-rtgam-pipeline` repo #88

Open athowes opened 1 month ago

athowes commented 1 month ago

I think it would be informative to consider creating a cfa-rtgam-pipeline repository based on cfa-epinow2-pipeline but for the in development RtGam package.

This could help with thinking about design of the pipeline repositories (including cfa-epinow2-pipeline currently).

athowes commented 1 month ago

For example looking at the files in cfa-epinow2-pipeline:

Don't seem to be EpiNow2 specific:

Seem to be EpiNow2 specific:

I'd say this consideration points to the "non-EpiNow2-specific" code not living in cfa-epinow2-pipeline.

zsusswein commented 1 month ago

Yes this is the goal! Workflow in my mind is:

athowes commented 1 month ago

Alternatively, the functions in extract_diagnostics, fit_model, and write_output could be rewritten to dispatch on class, and this repo could be called cfa-rt-pipeline.

So there are two options:

  1. cfa-pipeline-tools for all shared, then model specific R packages
  2. cfa-rt-pipeline containing both tools and specific methods for models

I think I favour 2. and 1. seems a bit much.

Interested to hear other's POV?

athowes commented 1 month ago

One possible benefit of 2. is that suppose we want to implement some kind of model stacking or averaging functionality. Then perhaps it's simple to do that from within one package. It also feels to me like having methods for each model enforces some kind of shared interoperability which is harder at a distance when split across packages. We can also then test all of the models and methods using the same approaches.

I think this is quite a big decision and something we might want to use some kind decision record for.

I suppose the counter point, and perhaps where CFA is going, is that in future we want to use models which are not writen in R, and this cfa-rt-pipeline single package limits us in some ways there?

zsusswein commented 1 month ago

I pretty strongly disagree. We tried putting a bunch of stuff in one Docker image in our first go-round at this and it worked poorly. It lead to long build times and huge images.

I think we're seeing a lot of practical infra benefit from keeping things separate and tightly scoped. We also want to do this setup for non-R language models -- we'll benefit from a single, language-agnostic process.

My vision for this is that we have:

  1. Model dev repos, owned or not owned by CFA
  2. Shared model deployment code
  3. Model-specific deployment code which shares 1 and 2 and outputs into a uniform spec
  4. Generic postprocessing and viz code which reads from 3