epiverse-trace / epidemics

A library of published compartmental epidemic models, and classes to represent demographic structure, non-pharmaceutical interventions, and vaccination regimes, to compose epidemic scenarios.
https://epiverse-trace.github.io/epidemics/
Other
8 stars 2 forks source link

Add an `<epidemic>` class #159

Closed pratikunterwegs closed 6 months ago

pratikunterwegs commented 6 months ago

This draft PR fixes #156 by adding an <epidemic> class. This is scheduled to be merged after #158.

<epidemic> is a list-based class with four elements: the function name, the model data, the model inputs (parameters and composable elements), and the version hash for reproducibility or comparability.

Changes:

  1. All model functions now return <epidemic> as output;
  2. All helper functions previously operating on data.frame model outputs now operate on <epidemic> (e.g. epidemic_size());
  3. get_parameter() supports <epidemic> and allows accessing the model data ("data") and the parameters and composable inputs ("parameters");
  4. All vignettes, examples, and tests now return, expect, and use <epidemic> outputs.
pratikunterwegs commented 6 months ago

Thanks - this is still a draft and there will be an oppotunity to discuss this among other things in the upcoming P3 development meeting. I'll update this PR to reflect any decisions we make.

pratikunterwegs commented 6 months ago

Thanks for the feedback @Bisaloo - this was discussed in the (just concluded) meeting with @rozeggo, @BlackEdder, @TimTaylor and @bahadzie.

In summary we have decided against the structure of <epidemic> shown in this draft, but an <epidemic> class is likely to be implemented in the foreseeable future.

A discussion of, and decisions about, the new structure of <epidemic> and {epidemics} are available as a link on Slack - any feedback there is welcome. I will open issues based on meeting decisions by mid-day tomorrow, leaving time for any feedback (and I can edit them based on any feedback given later too).

pratikunterwegs commented 6 months ago

I don't think it's the right approach to solve the reproducibility issue. This is not something that should be addressed at the package-level, but rather in trainings, such as the one Andree produced on research compendia. This gives users a false sense of security and could incentivize them to compare outputs in situations where they should not be compared. If the time frame between two simulations is such that the package has been updated, it is much safer to encourage users to re-run all simulations, as other elements in the whole stack may have changed.

We have decided against including any versioning and agree that environment management is a user task.

Additionally, this leads to a more complex return object. This PR moves away from the nice simple data.frame, which is immediately clear to e.g., excel users, to have a more complex list. The change in the output type is also is step back in terms of interoperability IMO since many generic data science packages work directly with data.frames.

I agree and think users should be responsible for parameter management. Keeping in mind that not all users might be able to do this successfully, we have decided to structure model outputs as a class with encapsulated parameters, but hopefully one that can be handled by methods for data.frames - possibly a nested data.frame, data.table, or tibble. Feedback on this is welcome.

As stated on slack, there could be a case for including some metadata in the output if this greatly facilitates the downstream analysis via / the integration with scenarios. But this is a very slippery slope that comes at the cost of increased complexity and a steeper learning curve for the package. Doing it for reproducibility reasons is not correct IMO. As such:

I agree - but see above that we have decided that we should better facilitate parameter management and scenario comparability for users. Again any feedback on having/the eventual structure of <epidemic> are welcome.