Generalise package structure for a different underlying BP model

sophiemeakin commented 10 months ago

The current version of the package is for a very specific branching process model. The package could be generalised to work for any (reasonable) branching process - but this needs some thought about how to define outbreak_setup() and outbreak_step() for a new process, what properties of the process to track over time, as well as how to pass parameters of the new process in a more general way (discussed already in https://github.com/epiforecasts/ringbp/issues/65).

sophiemeakin commented 10 months ago

For example, pepbp package

pearsonca commented 10 months ago

Hmm - so is the idea here to have

res <- scenario_sim(
  n = 10, # n is the stand name for number of samples
  parms = ringbp::parameters(), # the deSolve mispelling is widely used for inexplicable reasons 
  control = ringbp::control_opts() # gives defaults when empty
)

...become

res <- scenario_sim(
  n = 10, # n is the stand name for number of samples
  parms = ringbp::parameters(), # the deSolve mispelling is widely used for inexplicable reasons 
  control = ringbp::control_opts(), # gives defaults when empty
  model = ringbp::model() # gives standard model by default, but allows inserting a new step / draw function
)

where the problems you highlight with defining outbreak_step e.g. live inside that model function somewhere.

That seems potentially workable to me - I suspect the main other challenges here will be:

[ ] how to have the options / arbitrary model work together
[ ] how to ensure that the new measures associated with the model (e.g. pep doses administered?) get returned
[ ] possibly adapting some of the other tools to tolerate the extra data in results

On balance, seems like a good idea. Should be lots of shared infrastructure among BP-based models.

sophiemeakin commented 10 months ago

That's what I was thinking, yes.

Re your last two points (returning new measures associated with the model, and adapting other tools to tolerate the extra data), there could be two outputs from each scenario/iteration: one core output that is shared across all models (generation, infector, number of secondary cases), and a second with the model-specific outputs (e.g. for pepbp number of high- and low-risk contacts, number of PEP doses).

pearsonca commented 10 months ago

I'd guess the most-likely-to-make-this-happen next step is to provide (here?) focused snippets showing where pepbp had to diverge from ringbp to do what you needed. I imagine that development went roughly "copy-paste-modify" - highlight the modifications?

Then folks can suggest options for implementing a generic approach that would accommodate this specific example.

n.b. if you elect to put some snippets here, there should be useful formatting syntax ala https://stackoverflow.com/questions/36634252/how-to-diff-in-a-comment-on-github - doesn't seem to support side-by-side, alas.

somefunction <- function() {
...
same-same
- subtracted line
+ added line
same-same
...
}

adamkucharski commented 10 months ago

Agree would be nice to think about generalisations, especially if there are modular components that could be reused. In case helpful to inform plans, a couple of related efforts:

We did a couple of covid studies by developing a network model with ringbp processes as the 'engine' for transmission and delays. It was coded as a single model rather than being nicely modular, but there may be some links with aims of packages like pepbp and how to structure these.
The Ebola ring vaccination modelling that ringbp was partly based on had a multi-type branching process, with different offspring distributions for detected and missed cases (to represent superspreading outside known transmission clusters). This was hard-coded in quite an inflexible way, but sounds like there are some parallels with other desirable extensions to the basic BP approach.
We've had some discussion of tools for quick simulated linelist and contact datasets in a {simulist} issue here, so some of these approximations/limitation may also be relevant to any efforts to expand ringbp to simulate contacts as well as transmission

sbfnk commented 10 months ago

I think supporting custom models here is a really good idea - although it wouldn't add anything in terms of making wider functionality (say, PEP or network models) available to others if they would still live in separate repos. Potentially providing a custom interface here would even discourage people from sharing their models vs. forking something that already is a package and can be re-badged as another (see {pepbp} and {covidhm}) that is then ready to use by others.

The alternative option would be to provide some documentation on "how to spin off your own model by using a fork", bringing in any lessons learned in previous attempts / streamlining the package to make that as easy as possible.

Bisaloo commented 10 months ago

It's a tough problem.

The alternative option would be to provide some documentation on "how to spin off your own model by using a fork", bringing in any lessons learned in previous attempts / streamlining the package to make that as easy as possible.

In general and from an open-source ecosystem point of view, this is not great. Any general improvements or bug fixes that could be made in a specific fork may (=will likely) not be shared with the upstream repo, and will not benefit others. We end up with an extremely fragmented ecosystem where anyone that want to build upon existing work has to browse through a potentially large number of repos to maybe eventually find the info they need.

On the other hand, creating a general framework that can accommodate arbitrary models is very hard and I feel we may end up with either something unwieldy or something unflexible that doesn't address the need reported here.

:thinking:

pearsonca commented 10 months ago

The alternative option would be to provide some documentation on "how to spin off your own model by using a fork", bringing in any lessons learned in previous attempts / streamlining the package to make that as easy as possible.

In general and from an open-source ecosystem point of view, this is not great. Any general improvements or bug fixes that could be made in a specific fork may (=will likely) not be shared with the upstream repo, and will not benefit others. We end up with an extremely fragmented ecosystem where anyone that want to build upon existing work has to browse through a potentially large number of repos to maybe eventually find the info they need.

Agree - one solution is to make it easy to add new model plugin implementations to the repo, with a clear process for adding contributor credit. That's an update to the contributor guide, naming plan, and probably an approach that sends things to inst/. Upside being that people are then encouraged to promote citation of their paper + the package (which now contains their model).

On the other hand, creating a general framework that can accommodate arbitrary models is very hard and I feel we may end up with either something unwieldy or something unflexible that doesn't address the need reported here.

Possibly, but: seems worth trying and y'all have a concrete example to develop accommodations against.

adamkucharski commented 10 months ago

A version of this problem came up in a recent discussion with simulist (https://github.com/epiverse-trace/simulist/issues/58), when we wanted to incorporate simulation of contacts as well as secondary cases into epichains. This is a relatively simple change to make to a branching process model (i.e. add a line of code to generate contacts, then another to draw the secondary cases from secondary attack rate, as well as book keeping for contacts and SAR as input). So it is a relatively small change that would generate an output (contacts) potentially useful to several people and could easily be turned off without slowing down code or disrupting existing implementations.

So perhaps a rough calculation for whether we add more complexity to base models in package vs encourage separate adaptions to specific use cases (and provide documentation on how this could be done with book keeping etc.) could be something like:

Decision weighting = (Wider usefulness of new functionality) / (Disruption to existing code/arguments to include this functionality)

pearsonca commented 10 months ago

A version of this problem came up in a recent discussion with simulist (epiverse-trace/simulist#58), when we wanted to incorporate simulation of contacts as well as secondary cases into epichains. This is a relatively simple change to make to a branching process model (i.e. add a line of code to generate contacts, then another to draw the secondary cases from secondary attack rate, as well as book keeping for contacts and SAR as input). So it is a relatively small change that would generate an output (contacts) potentially useful to several people and could easily be turned off without slowing down code or disrupting existing implementations.

So two examples to work against. Probaby three: https://github.com/SACEMA/COVID10k/ is bpmodels + offspring events - could have just written the contingent event generation as a model extension.

Decision weighting = (Wider usefulness of new functionality) / (Disruption to existing code/arguments to include this functionality)

Dunno about the denominator there (though my impression is that y'all view this package as needing more general interface overall, so seems mostly like "disruption to code" only), but a easy-to-use-and-extend package for a standard modeling concept seems obviously widely useful. BP are (or should be) standard introductory models, which remain useful for low-resolution estimates, but feels like mostly people re-invent the wheel every time.

jamesmbaazam commented 4 months ago

Josh and I had a chat with Dillon (@dcadam) today after his seminar based his paper on time-varying superspreading, which used ringbp to simulate a contact network for COVID-19 and SARS. It'd be nice to hear his perspective on this issue on ringbp customisation: how easy it was to use it out of the box, what would have made it more easily customized for his use case, etc.

dcadam commented 4 months ago

Hi James. For the network simulation in my paper, I actually used bcgov/epi.branch.sim which itself references ringbp. But this wasn't so easy to customise, but it may fit others' use cases. I can look more into the underlying code structure of ringbp compared to epi.branch.sim and where improvements could be found to make it more customisable as you ask.

epiforecasts / ringbp

Generalise package structure for a different underlying BP model #68