InstituteforDiseaseModeling / covasim

COVID-19 Agent-based Simulator (Covasim): a model for exploring coronavirus dynamics and interventions
https://covasim.org
MIT License
250 stars 223 forks source link

Is it possible to eliminate all non-determinism apart from generating the age of each agent? #368

Closed AndrewC19 closed 2 years ago

AndrewC19 commented 3 years ago

Hi,

I am trying to run covasim lots of times to observe the effect of changing the age distribution on cumulative infections.

I would like to set the age distribution and then run the model 30 times to get 30 different runs with different samples from the same age distribution.

Since I want to isolate the effect of age, I need to block other sources of non-determinism by setting a fixed random seed. However, this means that each of the 30 runs yields the same sample 30 times.

I would like to be able to hold everything constant apart from the process of sampling the age of each agent in the population. Is there a simple way for me to achieve this?

Thanks!

cliffckerr commented 3 years ago

Hi @AndrewC19 , thanks for the question!

Technically this is possible -- you can create different simulations and then mix-and-match between them, e.g.

import covasim as cv
base_sim1 = cv.Sim(rand_seed=1)
base_sim2 = cv.Sim(rand_seed=2)
base_sim1.initialize() # Create age distribution with seed 1
base_sim2.initialize()
sim1 = cv.Sim(rand_seed=1)
sim2 = cv.Sim(rand_seed=1) # Same seed
sim1.initialize()
sim2.initialize()
sim1.people.age = base_sim1.people.age
sim2.people.age = base_sim2.people.age
sim1.people.initialize() # Re-initialize based on the changed ages
sim2.people.initialize()
sim1.run()
sim2.run()

However, I don't know if this will give you the results you expect, due to a limitation of dynamical systems. Let's say you have two identical simulations except that in simulation 1 (S1), one agent is 80, while in S2, the same agent is 50. On timestep 5, this agent gets infected. In S1, this agent becomes critically ill and is hospitalized before they infect anyone else. In S2, they are not hospitalized, so they then pass it onto their household. From their household, their kids pass it onto their school. And so on. At this point, the simulations have completely diverged -- all because a single agent on a single timestep was vs. was not hospitalized. In other words, it won't actually matter if all the other quantities are initialized with the same seed or not, since these chaotic effects will dominate regardless.

AndrewC19 commented 3 years ago

Hi @cliffckerr, thanks for the quick response - I will implement this and see how it goes.

Also, I am not expecting this to completely isolate the effect of age. I just want to be sure that the difference between the executions can be traced back to the change in age and not some other difference that was caused by another random process (i.e. running with different seeds). It doesn't matter what the sequence of events following the change in age is, only that this sequence starts with the change in age, if that makes any sense?

cliffckerr commented 3 years ago

Got it. I do think that in practice it won't actually make an observable difference due to these chaotic effects, but would be curious to see if that's not the case. For example, the seed shouldn't actually matter since as soon as the first difference in simulations occurs (e.g. a person being hospitalized vs. not), the random number streams will diverge and it will be the same as using different seeds.

AndrewC19 commented 3 years ago

I will let you know!

Out of curiosity, what would happen if I were to keep the seed fixed and change the age distribution in an insignificant way? For example, running the model once with the standard UK age distribution and then moving one person from 40 - 49 bin to 50 - 59 before running again. Would this lead to a different random number stream or cause covasim to draw a different set of ages for people?

cliffckerr commented 3 years ago

Let's find out!

import covasim as cv
s1 = cv.Sim()
s1.initialize()
s2 = s1.copy()
s2.people.age[0:10] = (s2.people.age[0:10] + 50) % 100 # Rotate their ages by 50 years
s2.people.initialize()

msim = cv.MultiSim([s1, s2]).run()
msim.plot()

image

So changing the ages of the first 10 agents had minimal effect on the infections but a large impact on deaths. If you ran with a different initial seed (this is 1 by default) or a different population size you'd probably get different results.

cliffckerr commented 2 years ago

Closing since resolved