epiforecasts / ringbp

Simulate infectious disease transmission with contact tracing
http://epiforecasts.io/ringbp/
Other
15 stars 10 forks source link

Generalise package structure for a different underlying BP model #68

Open sophiemeakin opened 7 months ago

sophiemeakin commented 7 months ago

The current version of the package is for a very specific branching process model. The package could be generalised to work for any (reasonable) branching process - but this needs some thought about how to define outbreak_setup() and outbreak_step() for a new process, what properties of the process to track over time, as well as how to pass parameters of the new process in a more general way (discussed already in https://github.com/epiforecasts/ringbp/issues/65).

sophiemeakin commented 7 months ago

For example, pepbp package

pearsonca commented 7 months ago

Hmm - so is the idea here to have

res <- scenario_sim(
  n = 10, # n is the stand name for number of samples
  parms = ringbp::parameters(), # the deSolve mispelling is widely used for inexplicable reasons 
  control = ringbp::control_opts() # gives defaults when empty
)

...become

res <- scenario_sim(
  n = 10, # n is the stand name for number of samples
  parms = ringbp::parameters(), # the deSolve mispelling is widely used for inexplicable reasons 
  control = ringbp::control_opts(), # gives defaults when empty
  model = ringbp::model() # gives standard model by default, but allows inserting a new step / draw function
)

where the problems you highlight with defining outbreak_step e.g. live inside that model function somewhere.

That seems potentially workable to me - I suspect the main other challenges here will be:

On balance, seems like a good idea. Should be lots of shared infrastructure among BP-based models.

sophiemeakin commented 7 months ago

That's what I was thinking, yes.

Re your last two points (returning new measures associated with the model, and adapting other tools to tolerate the extra data), there could be two outputs from each scenario/iteration: one core output that is shared across all models (generation, infector, number of secondary cases), and a second with the model-specific outputs (e.g. for pepbp number of high- and low-risk contacts, number of PEP doses).

pearsonca commented 7 months ago

I'd guess the most-likely-to-make-this-happen next step is to provide (here?) focused snippets showing where pepbp had to diverge from ringbp to do what you needed. I imagine that development went roughly "copy-paste-modify" - highlight the modifications?

Then folks can suggest options for implementing a generic approach that would accommodate this specific example.


n.b. if you elect to put some snippets here, there should be useful formatting syntax ala https://stackoverflow.com/questions/36634252/how-to-diff-in-a-comment-on-github - doesn't seem to support side-by-side, alas.

somefunction <- function() {
...
same-same
- subtracted line
+ added line
same-same
...
}
adamkucharski commented 7 months ago

Agree would be nice to think about generalisations, especially if there are modular components that could be reused. In case helpful to inform plans, a couple of related efforts:

sbfnk commented 7 months ago

I think supporting custom models here is a really good idea - although it wouldn't add anything in terms of making wider functionality (say, PEP or network models) available to others if they would still live in separate repos. Potentially providing a custom interface here would even discourage people from sharing their models vs. forking something that already is a package and can be re-badged as another (see {pepbp} and {covidhm}) that is then ready to use by others.

The alternative option would be to provide some documentation on "how to spin off your own model by using a fork", bringing in any lessons learned in previous attempts / streamlining the package to make that as easy as possible.

Bisaloo commented 7 months ago

It's a tough problem.

The alternative option would be to provide some documentation on "how to spin off your own model by using a fork", bringing in any lessons learned in previous attempts / streamlining the package to make that as easy as possible.

In general and from an open-source ecosystem point of view, this is not great. Any general improvements or bug fixes that could be made in a specific fork may (=will likely) not be shared with the upstream repo, and will not benefit others. We end up with an extremely fragmented ecosystem where anyone that want to build upon existing work has to browse through a potentially large number of repos to maybe eventually find the info they need.

On the other hand, creating a general framework that can accommodate arbitrary models is very hard and I feel we may end up with either something unwieldy or something unflexible that doesn't address the need reported here.

:thinking:

pearsonca commented 7 months ago

The alternative option would be to provide some documentation on "how to spin off your own model by using a fork", bringing in any lessons learned in previous attempts / streamlining the package to make that as easy as possible.

In general and from an open-source ecosystem point of view, this is not great. Any general improvements or bug fixes that could be made in a specific fork may (=will likely) not be shared with the upstream repo, and will not benefit others. We end up with an extremely fragmented ecosystem where anyone that want to build upon existing work has to browse through a potentially large number of repos to maybe eventually find the info they need.

Agree - one solution is to make it easy to add new model plugin implementations to the repo, with a clear process for adding contributor credit. That's an update to the contributor guide, naming plan, and probably an approach that sends things to inst/. Upside being that people are then encouraged to promote citation of their paper + the package (which now contains their model).

On the other hand, creating a general framework that can accommodate arbitrary models is very hard and I feel we may end up with either something unwieldy or something unflexible that doesn't address the need reported here.

Possibly, but: seems worth trying and y'all have a concrete example to develop accommodations against.

adamkucharski commented 7 months ago

A version of this problem came up in a recent discussion with simulist (https://github.com/epiverse-trace/simulist/issues/58), when we wanted to incorporate simulation of contacts as well as secondary cases into epichains. This is a relatively simple change to make to a branching process model (i.e. add a line of code to generate contacts, then another to draw the secondary cases from secondary attack rate, as well as book keeping for contacts and SAR as input). So it is a relatively small change that would generate an output (contacts) potentially useful to several people and could easily be turned off without slowing down code or disrupting existing implementations.

So perhaps a rough calculation for whether we add more complexity to base models in package vs encourage separate adaptions to specific use cases (and provide documentation on how this could be done with book keeping etc.) could be something like:

Decision weighting = (Wider usefulness of new functionality) / (Disruption to existing code/arguments to include this functionality)

pearsonca commented 7 months ago

A version of this problem came up in a recent discussion with simulist (epiverse-trace/simulist#58), when we wanted to incorporate simulation of contacts as well as secondary cases into epichains. This is a relatively simple change to make to a branching process model (i.e. add a line of code to generate contacts, then another to draw the secondary cases from secondary attack rate, as well as book keeping for contacts and SAR as input). So it is a relatively small change that would generate an output (contacts) potentially useful to several people and could easily be turned off without slowing down code or disrupting existing implementations.

So two examples to work against. Probaby three: https://github.com/SACEMA/COVID10k/ is bpmodels + offspring events - could have just written the contingent event generation as a model extension.

Decision weighting = (Wider usefulness of new functionality) / (Disruption to existing code/arguments to include this functionality)

Dunno about the denominator there (though my impression is that y'all view this package as needing more general interface overall, so seems mostly like "disruption to code" only), but a easy-to-use-and-extend package for a standard modeling concept seems obviously widely useful. BP are (or should be) standard introductory models, which remain useful for low-resolution estimates, but feels like mostly people re-invent the wheel every time.

jamesmbaazam commented 1 month ago

Josh and I had a chat with Dillon (@dcadam) today after his seminar based his paper on time-varying superspreading, which used ringbp to simulate a contact network for COVID-19 and SARS. It'd be nice to hear his perspective on this issue on ringbp customisation: how easy it was to use it out of the box, what would have made it more easily customized for his use case, etc.

dcadam commented 1 month ago

Hi James. For the network simulation in my paper, I actually used bcgov/epi.branch.sim which itself references ringbp. But this wasn't so easy to customise, but it may fit others' use cases. I can look more into the underlying code structure of ringbp compared to epi.branch.sim and where improvements could be found to make it more customisable as you ask.