InstituteforDiseaseModeling / covasim

COVID-19 Agent-based Simulator (Covasim): a model for exploring coronavirus dynamics and interventions
https://covasim.org
MIT License
254 stars 223 forks source link

Vaccination with subtarget is vaccinating more than the total number of agents in the model. #370

Closed AndrewC19 closed 2 years ago

AndrewC19 commented 3 years ago

Short Description When setting a vaccination subtarget (e.g. prioritise the elderly), the cumulative number of vaccinated agents (cum_vaccinated) exceeds the total number of agents in the model (pop_size) by a significant margin.

Describe the bug After running a simple simulation with a population size of 10,000 and the pfizer vaccine, I end up with cum_vaccinations = 30000.0, cum_vaccinated = 10000.0, and n_vaccinated = 10000.0. This is the expected behaviour.

However, when I change my simulation to use a pfizer vaccine with subtarget=vaccinate_by_age where vaccinate_by_age is the method found in tutorial 5, I end up with cum_vaccinations = 122534.0, cum_vaccinated = 102754.0, and n_vaccinated=9866.0.

In both cases, people.vaccinations contains 10000 values, between 0 and 2, and therefore no agent is vaccinated more than twice.

How is it possible that in the second simulation the cumulative statistics exceed 20000?

For both vaccines, I set days=list(range(35)). If I'm not mistaken, this means that vaccines should only be given on days 0 to 35 of the simulation. However, when I look at people.date_vaccinated, it appears that everyone is vaccinated on day 21 in the first simulation and I have dates which are greater than 35 in the second simulation.

Should this be possible? The documentation says days (int/arr): the day or array of days to apply the interventions.

As a side note, could you also explain the difference between cum_vaccinated and cum_vaccinations and why, in the first example, the latter is three times greater.

To reproduce

import covasim as cv
import numpy as np

def vaccinate_by_age(sim):
    young = cv.true(sim.people.age < 50)
    middle = cv.true((sim.people.age >= 50) * (sim.people.age < 75))
    old = cv.true(sim.people.age > 75)
    inds = sim.people.uid
    vals = np.ones(len(sim.people))
    vals[young] = 0.1
    vals[middle] = 0.5
    vals[old] = 0.9
    output = dict(inds=inds, vals=vals)
    return output

vaccine_a = cv.vaccinate_prob(vaccine="pfizer", label="pfizer", days=list(range(35)))
vaccine_b = cv.vaccinate_prob(vaccine="pfizer", label="pfize", days=list(range(35)), subtarget=vaccinate_by_age)

pars_a = {"location": "UK", "pop_size": 10000, "use_waning": True, "interventions": vaccine_a}
pars_b = {"location": "UK", "pop_size": 10000, "use_waning": True, "interventions": vaccine_b}

sim_a = cv.Sim(pars=pars_a)
sim_a.run()

sim_b = cv.Sim(pars=pars_b)
sim_b.run()

print(sim_a.people.vaccinated, sim_b.people.vaccinated)
print(sim_a.people.date_vaccinated, sim_b.people.date_vaccinated)
print(sim_a.summary[["cum_vaccinations", "cum_vaccinated", "n_vaccinated"]], sim_b.summary[["cum_vaccinations", "cum_vaccinated", "n_vaccinated"]])

Thanks in advance!

jdclarke5 commented 3 years ago

I have observed a similar phenomenon. My guess is that it has something to do with this line, which incorporates dynamic rescaling into the sum over the vaccinated state. Perhaps this could be investigated?

https://github.com/InstituteforDiseaseModeling/covasim/blob/253920e7cb9ef5dac040cd76fb68972efa2475ea/covasim/analysis.py#L313

Another thing I have noticed in the BaseVaccination class is that the self.vaccinations and sim.people.vaccinations arrays get out-of-sync when dynamic rescaling takes effect (sim.rescale_vec[sim.t] > 1). This makes the following line filter out the wrong agents leading to mismatch of vaccinations, too:

https://github.com/InstituteforDiseaseModeling/covasim/blob/253920e7cb9ef5dac040cd76fb68972efa2475ea/covasim/interventions.py#L1379

I cannot find where/why this occurs but suspect that during dynamic rescaling the order of sim.people changes somehow. This is my only theory. For anyone coming across this I suggest to not use any internal arrays and use the attribute assigned to the agent as the source of truth instead. For example:

vacc_inds = vacc_inds[~sim.people.dead[vacc_inds]]
vacc_inds = vacc_inds[sim.people.vaccinations[vacc_inds] < self.p['doses']]
cliffckerr commented 2 years ago

This has been fixed in 3.1 (to be released in the next couple days). The new output (updating cum_vaccinations to cum_doses) is:

[ True  True  True ...  True  True  True] [ True  True  True ...  True  True  True]
[21. 21. 21. ... 21. 21. 21.] [ 1. 29.  1. ... 12.  2. 16.]
[20000.0, 10000.0, 10000.0] [19745.0, 9842.0, 9842.0]