AlgebraicJulia / AlgebraicABMs.jl

Stochastic graph rewriting for agent based modeling
https://algebraicjulia.github.io/AlgebraicABMs.jl/
MIT License
3 stars 2 forks source link

Contact rate semantics #39

Open kris-brown opened 2 months ago

kris-brown commented 2 months ago

"Contact rate" is a concept which is familiar to epidemiologists but doesn't neatly fit into the present "pattern match" based timing of AlgebraicABMs. The phenomenon we want to capture is that, at some frequency for each infected person, they interact with an arbitrary other person. That other person gets infected iff they are susceptible (only under that circumstance is there an infection event).

If we naively try to make the pattern be "S + I", then the rate of (attempted) events is proportional to the number of susceptible people (imagine a simulation with one infected person and a trillion susceptibles).

Thus the idea of a "basis" $b: B \rightarrowtail L$ was introduced to separate the fragment of the pattern which is relevant for timing. However, this fails to have the intended semantics because, although there is an attempted event proportional to infected people, it will succeed if there exists a susceptible person anywhere in the world state, $X$ (as there exists an extension of the match $m: B \rightarrow X$ which factors through $b$).

One way to further refine this idea of a basis is to split it into two components: $B \rightarrowtail B' \rightarrowtail L$. When it comes time to fire the rule (proportional to matches out of $B$), we first arbitrarily extend the match to be from $B'$, and then we demand that the extension to $L$ is a unique one. This has the intended semantics, but complicates the framework by adding more objects and morphisms with subtle semantics to keep track of.

Another way to address this is to 'bite the bullet' and implement https://github.com/AlgebraicJulia/AlgebraicABMs.jl/issues/19. A simple wiring diagram program with one Query box and one Rewrite box would be sufficient to express this scenario.

Screenshot 2024-08-28 at 11 36 01 AM

(note this is using Query in a common-but-idiosyncratic way: the standard use is looping through all the possible matches of some particular way of extending the current 'agent' (AKA 'distinguished focus'); however, if we pass through it once and never loop back through it, it has the effect of extending the current agent with an arbitrary choice)

(also, in this scenario, the "input agent" of the rule is I+P (it will only rewrite this particular infected + person pair), but the actual $L$ pattern is I+S. It's important "S" is in the pattern rather than a PAC because we need to delete the 'susceptible token' as part of the rule)

ndo885 commented 2 months ago

Thanks for this nice writeup. I'm hoping that the description of the first option could be clarified. I have 2 questions and one comment: 1) Is B' serving to delineate the more generic pattern for which a selected match is chosen with uniform probability, and then, if one has been selected, then L would specify the traditional "L" in the rewriting rule that would be attempted?

2) Could you clarify the logical need for the morphism B' -> L to not only exist, but to also be unique?

Beyond the above, I would caution about the reference to finding an "arbitrarily" in " we first arbitrarily extend the match to be from B′ ". Perhaps it is a matter of different definitions of the word "arbitrary", I don't see this choice as arbitrary. I think of "arbitrary" as meaning "it doesn't matter which one we pick", but here it does matter -- after all, we wouldn't want this choice to systematically privilege certain matches (which would distort the mixing dynamics). Rather, we want something better defined -- we wish to pick one of the possible matches to B' with uniform probability.

kris-brown commented 2 months ago

Agree that "uniformly selected" connotes what we need better than "arbitrary".

  1. Correct!
  2. That's a good question: I think it's related to "arbitrary" vs "uniform". If we didn't require the second extension to be unique (and require just that it merely exists), then it would have to be an arbitrary extension. Perhaps that's desirable. The fact that there are so many subtle variations makes me think the wiring diagram solution is preferred, since we'll constantly be desiring new variants if we add them one at a time as special cases.
wwaites commented 2 months ago

I think that contact rate fits if we don't also mix in the disease state. As in the reason that epidemiologists think of contact rates as they do is because, in simpler bilinear map well mixed ODE models, you just shove a matrix in there with some numbers and it works.

But in an agent-based world, we can do much better. I'll write in a kappa-like notation, I'm certain you can encode the same in CSets.

Contact is an independent process from disease transmission. Individuals that are not in contact can become in contact and vice versa:

P(), P() <-> P()-P() @ k1, k2

So, "in contact" means that a new edge appears between these two people whose disease state we do not care about, and this happens at rate k1, and they wander away from each other at a different rate, k2. The "-" is meant to represent an edge. A stratified contact is just the usual pullback trick.

Now, if this disease is the kind where contact is necessary for transmission, as it probably is otherwise we wouldn't be worried about contact rates, then we need not only a bond but a person each in the S and the I state,

P(i)-P(s) -> P(i)-P(i) @ beta

This is something that we wanted to do with the work with the Orthodox Jewish community in London except we didn't really have the tools: in regular kappa we can only conveniently have pair-wise interactions because of the port graph business.

So really, we might like to have something that says,

P(child)-Home <-> P(child)-School @ k1, k2
P(child,i)-School-P(child,s) -> P(child,i)-School-P(child's) @ b

which tells a story. A child goes from home to school, and while they are at school, there is some infection rate characteristic of that place (and whatever contacts in it, which could be broken out and explicitly modelled if desired).

The usual idea of contact rate used by epidemiologists is more an artefact of the mathematical tools epidemiologists are used to working with. Do we need special treatment of this concept in the underlying formalism? I don't think so. We can capture it by expressing it in the model.

The main downside of this approach is computational expense. There will be a hell of a lot of contact events that lead to nothing at all and there may well be ways of transforming the model to get a simpler, cheaper model with equivalent observables (moment bisimulation again). Then we are in territory akin to optimising compilers for models. That's good because things like "observable preserving schema migrations" might be a nice little vein of technical results to mine.

Here is a kappa model of the simple version of this, without bells and whistles and with the limitations of port-graphs, just demonstrating the mechanism.

// agent with a "contact" site and a disease state
%agent: P(c, d{s i r})

%var: k1 5      // form, on average, five contacts a day
%var: k2 48     // a contact lasts, on average, 30 minutes
%var: beta 1    // expect a small number contacts to result in infection
%var: gamma 1/5 // recover after about five days

// forming a contact bond/edge and destroying that bond/edge
'contacting' P(c[.]), P(c[.]) <-> P(c[1]), P(c[1]) @ k1, k2

// infection only happens when a contact bond/edge is present
'infecting'  P(c[1], d{i}), P(c[1], d{s}) -> P(c[1], d{i}), P(c[1], d{i}) @ beta

// recovery has nothing to do with contact edges
'recovering' P(d{i}) -> P(d{r}) @ gamma

// standard observables
%obs: S |P(d{s})|
%obs: I |P(d{i})|
%obs: R |P(d{r})|

// initial conditions
%init: 990 P(d{s})
%init: 10  P(d{i})
wwaites commented 2 months ago

My point is: there are actually two processes happening here, epidemiologists are in the habit of lumping them into one which is, I think, the source of the difficulty. If model them as the two processes that they are, there should be no problem. There might then be a principled way of lumping but that should be an optimisation, not a modelling thing.

kris-brown commented 2 months ago

Yes that's a very good point - there's always freedom to put complexity into the 'timer' or into explicit features of the state. Given the flexibility of layering in relationships into our schemas, it's certainly an available option for an epidemiologist to model contact+infection as two processes, as in your Kappa model.

Pragmatically though it would be nice to reduce onboarding friction by having something off the shelf that will work with their parameter values, while at the same time encouraging the explicit version.