dlchao / corvid

Corvid, an agent-based SARS-CoV-2 (COVID 2019) transmission model
Other
13 stars 9 forks source link

Instructions for Corvid, a stochastic coronavirus epidemic simulation model by Dennis Chao March 2020

Corvid is an individual-based model that simulates the spread of SARS-CoV-2 in synthetic populations that represent communities in the United States. It is described in "Modeling layered non-pharmaceutical interventions against SARS-CoV-2 in the United States with Corvid", available on medRxiv (https://doi.org/10.1101/2020.04.08.20058487).

Corvid is based on the influenza model FluTE (https://github.com/dlchao/FluTE). Several parameters were adjusted to adapt the FluTE flu transmission to SARS-CoV-2 transmission, such as incubation period, symptomatic fraction, and viral load trajectories.

This model only supports single-core (unlike FluTE, which had a parallel computing version), so populations of more than 10,000,000 might not run on a laptop.

We include several scripts that we use for producing reports and figures in the "scripts" directory. There are Makefile targets for setting up directories and running these scripts. When these runs are completed, you should be able to compile the R Markdown files (*.Rmd) to generate reports or figures based on these results.

The pdf "corvid-guide.pdf" is a short guide on how to run Corvid.


  1. Compilation

The source code and Makefile are in the "code" directory. The Makefile can be used on Linux systems to compile corvid (standard simulator) and R0corvid (version of corvid in which only a single index case is infectious). You can also make "guide", which will compile corvid and run it in a new directory "corvid-guide" using sample configuration files for a Seattle-like population. You can then use the R Markdown file ("corvid-guide.Rmd") that parses the outputs from these runs and produces a short document on how to use Corvid, corvid-guide.pdf.

The source files are: epimodel.cpp - model implementation corvid.cpp - contains main() params.cpp - simulation constants epimodelparameters.cpp - code to parse config files R0model.cpp - class derived from epimodel.cpp for R0 calculations.
Contains main(). dSFMT.c - SIMD oriented Fast Mersenne Twister (dSFMT) (from http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT) bnldev.c - Numerical Recipes in C code for drawing random numbers from the binomial distribution


  1. Data files

A set of 3 data files needs to be in the same directory as the executable to specify a population for Corvid. The file names consist of a prefix (e.g., "seattle") followed by a suffix.

*-tracts.dat - Tract populations and locations, from http://www.census.gov/geo/www/cenpop/cntpop2k.html. The columns in this comma-delimited file are: state FIPS code, county FIPS code, tract FIPS code, tract population, tract longitude, and tract latitude.

*-wf.dat - Tract-to-tract workerflow, extracted from stp64.us from Census 2000 Special Tabulation Product 64: Census tract of work by census tract of residence 2000. For the US-level data. Commutes over 100 miles were eliminated from the data. The columns in this space- delimited file are: home state FIPS code, home county FIPS code, home tract FIPS code, work state FIPS code, work county FIPS code, work tract FIPS code, and number of workers.

*-employment.dat - The number of employed working-age adults and the total number of working-age adults, from the Census Summary File 3 (SF3, Table PCT35). The columns in this space-delimited file are: state FIPS code, county FIPS code, tract FIPS code, number of employed 20-64 year olds, and the total number of working-age individuals (19-64 year olds).

The sample populations, in the corviddata directory, are:

one - a single community of 2000 people.

seattle - metropolitan Seattle, based on the 2000 US Census.

kingcounty - King County (Washington State), based on the 2000 US Census.

kingsnohomishpierce - King, Snohomish, and Pierce Counties (Washington State), based on the 2000 US Census.

la - Los Angeles County. la-tracts.dat is not from the 2000 Census; It is based on 2007 PEPS estimates from Walter R. McDonald & Associates, Inc. (WRMA) and padded with an additional 776,000 individuals to represent the estimated undocumented population [Flaming et al. Hopeful Workers, Marginal Jobs. 2005]. The other "la" data files are based on the 2000 US Census.


  1. Configuration

A text file with one parameter per line is supplied as the command-line argument for Corvid to configure a simulation run (e.g., "./corvid config-file"). Each line should start with a parameter name and the parameter value, separated by a single space. Parameter values can be strings (can not include white space), integers, reals, or binary (0 or 1). Square brackets indicate the number of parameters required. Lines beginning with "#" are ignored. Defaults are generally ok, but most configuration files should have a label, datafile, and R0 (or beta). The parameter names are as follows:

label : string A name that is output in the summary file for the user's convenience (e.g., "simulation1"). With this value, the user can tag output files from specific simulation runs.

datafile : (required) string The prefix for the input data file names (e.g., one, seattle, la, usa).

logfile : integer Number that indicates how often, in days, output is written to the Log file. If 0, no Log file is generated. Default is 1 for daily output.

individualfile : binary Specifies if an "Individuals" file should be generated. The file will contain one row of simulation-specific data for each individual and can be very large. If 1, file is generated. The default is 0, no file generated even if you specifiy "individualfilename".

summaryfilename : string Specify a name for the "Summary" file. The default name is "Summary(n)", where (n) is the number in the run-number file.

logfilename : string Specify a name for the "Log" file. The default name is "Log(n)", where (n) is the number in the run-number file.

tractfilename : string Specify a name for the "Tracts" file. The default name is "Tracts(n)", where (n) is the number in the run-number file.

individualfilename : string Specify a name for the "Individuals" file. The default name is "Individuals(n)", where (n) is the number in the run-number file.

beta : real Transmission parameter (sometime known as Ptrans). This value is multiplied by the inter-personal contact probability to determine the transmission probability between infected and susceptible persons. Default is 0.1.

R0 : real Basic reproductive number, which is internally converted to beta.

symptomaticfraction : real Fraction of cases who will become symptomatic. This parameter affects the relationship between beta and R0, which is hard-coded into epimodelparameters.cpp assuming the default of 0.5. If you use this option, use beta to set transmissibility, not R0. You should probably not change this parameter.

randomnumberseed : integer Random number generator seed.

runlength : integer Number of days the simulation should run. Default is 180.

preexistingimmunityprotection: real Those with pre-existing immunity have susceptibility reduced by preexistingimmunityprotection.

preexistingimmunitybyage : real[5] Vector of 5 values representing the fraction of individuals in each age group with pre-existing immunity. Those with pre-existing immunity have susceptibility reduced by preexistingimmunitylevel.

defaultVESbyage : real[5] Vector of 5 values representing the "VE_S" of individuals in each age group. This value reduces the susceptibility of the entire age group. A value of 0 indicates that the age group is fully susceptible and 1 indicates that it is completely immune. The default value is 0.

prestrategy : string Vaccination strategy before the epidemic occurs. Can take one of the following four values, none, prevaccinate, or primeboostsame. prevaccinate vaccinates a fraction of the population (with 2 doses if required). The primeboostrandom option will give a percentage of the population one shot before the epidemic then may or may not boost those same persons after the epidemic has started. Persons vaccinated after the epidemic has started will be selected at random. In contrast, primeboostsame will boost the same persons after the epidemic that were primed before the epidemic. Default is none.

reactivestrategy : string Vaccination strategy for when the epidemic is detected. Can take one of the following three values, none, tract, or mass. The tract option will vaccinate persons in a specific tract once the threshold (i.e., responsethreshold variable) is met. The mass option will vaccinate all persons in all tracts once the threshold is met. Default is none.

vaccinationfraction : real, scalar Fraction of people to vaccinate if a vaccination strategy is selected. Default is 0.7.

vaccinepriorities : integer[13] A vector of 13 numbers, representing the vaccine priority for the 13 categories of individuals. A value of 1 indicates highest priority, 2 is the next-highest priority, etc. 0 indicates that this category is not prioritized to get vaccine. Default is "1 1 1 1 1 1 1 1 1 1 1 1 1" (everyone has the same priority). Individuals of lower priority only get vaccine when those with higher do not need any. If a high priority individual is primed, lower priority persons can get vaccine until the high-priority person needs the boost. The categories, in order, are: essential workforce, pregnant women, family members of infants, high risk preschoolers, high risk school-age children, high risk young adults, high risk older adults, high risk elderly, all preschoolers, all school-age children, all young adults, all older adults, all elderly. For example, to have individuals of all ages get the same priority, we set all age-specific priorities to the same value: 0 0 0 0 0 0 0 0 1 1 1 1 1. to give adults lower priority we can do: 0 0 0 0 0 0 0 0 1 1 2 2 2.

vaccinepriorities2, prioritychangetime - parameters for changing vaccinepriorities in the middle of an epidemic. These parameters might not be supported in future releases.

antiviraldoses : integer, scalar Number of antiviral courses available at the beginning of the simulation. A single course is defined as 10 pills per person. Default is infinite.

vaccinedoses : integer[2] The vaccine ID followed by the number of vaccine doses available at the beginning of the simulation. A dose is defined as one shot per person. If the vaccine is a split-dose, (i.e., prime and boost), this variable must be twice the total number of doses. To specify 2,000,000 doses of vaccine 0, one would enter 0 2000000. Default is infinite.

vaccineproduction : integer[runlength+1] Vaccine ID followed by the number of vaccine doses that become available each day during the response. For example, a vector of 1 100 300 500 600 ... would indicate 100 doses of vaccine 1 to become available of day 1 or the response, 300 on day 2, etc. for up to the number of days specified in runlength. The default is 0 (additional vaccine is not produced during the simulation).

antiviraldosesdaily : integer Number of antiviral courses that can be delivered daily by available resources. Default is no restrictions, or unlimited ability to deliver.

vaccinedosesdaily : integer Number of vaccinations that can be administered daily by available resources. Default is no restrictions (all available vaccine is administered each day).

vaccinedata : integer, real[3], real[6], bool Total of 11 values indicating the vaccine id, vaccine efficacy and administration policies for each vaccine. The vaccine efficacy parameters (VE) should be between 0 (no efficacy) and 1 (complete efficacy). The administration policy values are the fraction of the age groups (infant, pre-school age, school-age, adult, elderly) and pregnant women who are restricted from getting the vaccine. The simulation randomly assigns this fraction of the individuals in these groups to be restricted. The default is 0 (none restricted), while 1 indicates that 100% of individuals in this class are restricted. Values must be entered in the following order: integer, Numeric ID for the vaccine, starting from 0. real, VE_S (the vaccine efficacy for susceptibility), real, VE_I (the vaccine efficacy for infectiousness), real, VE_P (the vaccine efficacy for illness given infection), real, Fraction of infants (age <6months) restricted from getting the vaccine. real, Fraction of pre-school age children (ages 0-4) restricted from getting the vaccine. real, Fraction of school age children (ages 5-18) restricted from getting the vaccine. real, Fraction of young adults (ages 19-29) restricted from getting the vaccine. real, Fraction of older adults (ages 30-64) restricted from getting the vaccine. real, Fraction of elderly (ages 65+) restricted from getting the vaccine. bool, Pregnant adults restricted from getting the vaccine. An example of specifying multiple vaccines: vaccinedata 0 0.4 0.5 0.83 1 0.2 0.2 1 1 1 1 vaccinedata 1 0.4 0.4 0.67 0 0 0 0 0 0 0 Here vaccine 0 could be a live vaccine for children and vaccine 1 could be administered to anyone. Note that vaccine 0 has 0.2 for the child restrictions, which means that 20% of children (chosen at random) are not eligible.

vaccinebuildup : integer, integer, real[29] Total of 31 values. The first value is the vaccine numeric ID, the second is the day that the boost should be given. The remaining 29 values are the vaccine efficacy over the 29 days after the vaccine is given. integer, Numeric ID for the vaccine. First vaccine must start at 0. integer, Minimum number of days between the prime and boost ranging from 0 to 28. Default is 0, no boost. reals, 29 values describing the vaccine efficacy buildup, ranging from 0 to 1. The value on the last day should be 1. The default is a one-dose vaccine that reaches maximum efficacy in two weeks.

vaccineefficacybyage : real[5] Vector of 5 values representing the relative vaccine efficacy for each age group. Age groups are, in order: pre-school (0-4 years), school age children (5-18 years), young adults (19-29 years), older adults (30-64 years), and elderly (65+ years). Values can range from 0=no efficacy to 1-full efficacy. The default is 1, full efficacy, for all age groups. For example, "1 1 1 1 0.6" would make vaccines only 60% as effective in the elderly as everyone else. The same settings are used for all vaccines.

AVEs : real Antiviral vaccine efficacy for susceptibility (VE_S). Default is 0.3.

AVEi : real Antiviral vaccine efficacy for infectiousness (VE_I). Default is 0.62

AVEp : real Antiviral vaccine efficacy for illness given infection (VE_P). Default is 0.6.

responsethreshold : real Fraction of the population ascertained that results in initiating reactive strategies. Reactive strategies include vaccinations, deploying antivirals, and non-pharmaceutical interventions like liberal leave. For example, 0.01 would set this trigger at 1% of the population. The default is 0.0 which initiates reactive strategies after the first person is ascertained.

responsedelay : integer Number of days to wait before initiating reactive strategies. A values of -1 would deploy reactive strategies on day 0, the first day of the simulation. this differs from the pre-strategy option because pre-vaccination assumes that you can vaccinate people early enough such that they have full protection on day 0. With a responsedelay of -1, they might get vaccine on day 0. Default is 1.

responseday : integer Sets the reactive responses to begin on specified day instead of waiting for ascertained cases (instead of using responsethreshold and responsedelay).

ascertainmentdelay : integer Number of days it takes medical personnel to ascertain a symptomatic individual. Default is 1 day.

ascertainmentfraction : real Fraction of all symptomatic individuals who will be ascertained. Default is 0.8. essentialfraction: real Fraction of working-age adults that belong to the "essential workforce" and are priortized for receiving vaccine. Default is 0, no priority. The essential workforce is 6.9% of the employed population when there is a vaccine shortage and 10.8% if there is not a shortage. Default is 0, no essential workers prioritized to get vaccine.

pregnantfraction : real[5] Fraction of people in each of the 5 age groups who are pregnant. Default is "0 0 0.02771 0.02069 0".

highriskfraction : real[5] Fraction of individuals in each of the 5 age groups who are at high risk of complications from coronavirus. for example, 0.089 0.089 0.212 0.212 0.0 would make 8.9% of children and 21.2% of adults under 65 years high risk. Default is 0, no high risk, for all age groups.

seedtract : integer[4] Seeds a single census tract with infected people. State, county, and tract FIPS followed by the number of people to infect.

seedinfected : integer Number of people to infect across the whole population. Default is 0, infect no people.

seedinfecteddaily : binary Indicates if "seedinfected" infected people should be introduced into the population every day or just on the first day. A value of 1 will seed infected people every day, while 0 seeds infected only on day 0. Default is 0.

seedairports : integer Value between 0 and 10000 to indicate the number of passengers per 10000 per day to infect in airport counties. Default is 0, no passengers to be infected.

travel : binary enable short-term long-distance travel. Value of 1 enables travel, 0 indicates no travel. Default is 0, but should be set to 1 if simulating the continental US.

antiviralpolicy : string Indicates which persons get antivirals. Can be one of four possible values, none, treatmentonly, or HHTAP (household members all get drugs if one member is ascertained). Default is none.

schoolclosurepolicy : string Identify which schools to close. Can take one of three possible values, none, all, or bytractandage. After reactive responses have started, either no schools close (none), all schools close (all) or the schools in a single tract can be closed (bytractand). Default is none.

schoolclosuredays : integer Number of days to close schools ranging from 0 to the value of runlength. The default is 0, no school closure. A value larger than runlength would close the schools permanently.

schoolopening : integer[56] School opening day for each state after the start of the simulation. Values of 0 or -1 indicate that the state's schools are open when the simulation starts, while other values indicate that the state's schools start after n days. The parameter takes 56 arguments, which correspond to the FIPS codes of the states. The values are for the states in the following order ("-" indicates that the value is not used): Alabama, Alaska, -, Arizona, Arkansas, California, -, Colorado, Connecticut, Delaware, District of Columbia, Florida, Georgia, -, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, -, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, -, Washington, West Virginia, Wisconsin, Wyoming

voluntaryisolation : real Voluntary isolation compliance probability. This is the probability that a sick person stays home voluntarily one day after symptoms. Default is 0, no isolation.

ascertainedisolation : real Ascertained isolation compliance probability. This is the probability that a sick person stays after being ascertained. Default is 0, no isolation.

quarantine : real Voluntary household quarantine compliance probability. This is the probability that the members of a household stay home if one of the family members becomes ascertained. Default is 0, no quarantine.

quarantinelength : integer Length of household quarantine in days. Default is 14.

liberalleave : real Liberal leave compliance probability. The probability that people will take off from work if they get sick (in addition to the usual probably of staying home when sick). Default is 0, no liberal leave.

liberalleavedays : integer Duration of liberal leave policy in days. Default is 10000.

workfromhome : real Working from home compliance probability. The probability that a person who normally goes to work will work from home instead. Default is 0, no work from home.

workfromhomedays : integer Duration of work from home policy in days. Default is 10000.

communitycontactreduction : real Reduction in community transmissibility. Default is 0, no reduction.

communitycontactreductiondays : integer Duration of reduction in community contacts policy in days. Default is 10000.


  1. Running corvid

To run Corvid: ./corvid config where config is the name of your configuration file. The program will print out its version number and any configuration errors to stdout.

Unless the names of all output files are specified in the configuration file, Corvid uses a file called "run-number" that is appended to the output file names. The contents of the file is a single number (e.g., 0) that is incremented each time Corvid is run. For example, the summary and log files will be "Summary0" and "Log0" by default. The run-number file is not included with the Corvid distribution and must be created by the user. This file, and the desired data files, should be in the same directory as the executable.


  1. Output files

A simulation run will produce 1 to 4 output files, depending on configuration parameters. The filenames are a prefix (e.g., "Summary") followed by a number, which is taken from the file run-number, unless the filenames are specified in the configuration file. All output files are in ascii text format.

Summary(n) - outputs the settings used for the simulation, final attack rates, vaccines used, daily totals of symptomatic individuals. Note that the "Number symptomatic" is the symptomatic prevalence, or the number of people who are symptomatic on a given day - it is not the number who become symptomatic that day.

Log(n) - The number of symptomatics and infected people per tract each day (prevalence and cumulative). Each line has: the day, a unique numeric tract id, the number of people currently symptomatic in the 5 age groups, the number of people who have ever been symptomatic in the 5 age groups, the number of people currently infected in the 5 age groups, the number of people who have ever been infected in the 5 age groups. Note that it does not report incident symptomatic or infected people - that can be derived from the cumulative counts. This can be an enormous file.

Tracts(n) - a unique numeric tract id, the FIPS codes, and the populations of the tracts by age. Also contains the number of people who work in this tract (including residents). Comma-separated value files with a header row. This is only output when Log files are output.

Individuals(n) - Lists info on all people, with one line per person. Comma-separated value files with a header row. Can be an enormous file.


  1. Sample configuration files

config-minimal - A basic configuration file. Simulates an epidemic with R_0=1.3 in a single community of 2000 people (using data from "one-*") with 10 initial infecteds. Uses a random number seed of "1".

config-seattle26 - Simulates an epidemic with R_0=2.6 in metropolitan Seattle (using data from "seattle-*") with 2 initial infecteds. Uses a random number seed of "1".

config-seattle26-closeallschools - Simulates an epidemic with R_0=2.6 in metropolitan Seattle (using data from "seattle-*") with 5 initial infecteds. All schools close on day 60 for 14 days before reopening. Uses a random number seed of "1".

config-seattle26-isolation - Simulates an epidemic with R_0=2.6 in metropolitan Seattle (using data from "seattle-*") with 5 initial infecteds. Voluntary home isolation starts on day 60 with a compliance of 60%.

config-seattle26-liberalleave - Simulates an epidemic with R_0=2.6 in metropolitan Seattle (using data from "seattle-*") with 5 initial infecteds. Workplace liberal leave starts on day 60 with a compliance of 60%.

config-seattle26-quarantine - Simulates an epidemic with R_0=2.6 in metropolitan Seattle (using data from "seattle-*") with 5 initial infecteds. Home quarantine starts on day 60 with a compliance of 60%.

config-seattle26-workfromhome - Simulates an epidemic with R_0=2.6 in metropolitan Seattle (using data from "seattle-*") with 5 initial infecteds. 60% of people start working from home instead of going to work starting on day 60.

config-laiv-vs-tiv - Vaccination of 50% of children enrolled in school with two different vaccines in Los Angeles County. Vaccine "0" represents live, attenuated vaccine, which has VE_S=40%, VE_I=50%, and VE_P=83% but is not administered to pre-school age children and people with asthma (see Basta et al 2008, American Journal of Epidemiology). Vaccine "1" is a trivalent inactivated vaccine, which has VE_S=40%, VE_I=40%, and VE_P=67%, and is given to the remainder of the school children. The program distributes the vaccines in order, so 50% of school-age children are selected to be vaccinated, and the non- asthmatic ones are pre-vaccinated with Vaccine 0, and then the remaining selected children are pre-vaccinated with Vaccine 1. If Vaccine 0 had no restrictions on use, then all selected children would receive it, and there would be no one left to receive Vaccine 1. The epidemic with R_0=1.8 is seeded with 9 new infected people each day.

config-twodose - Reactive mass vaccination of 70% of people in Seattle. Vaccine requires two doses. The first confers 50% of the final efficacy after a two-week exponential buildup. The second, which must be given at least 21 days later, confers the final efficacy after a 7-day buildup.


  1. Scripts

There are several scripts and configuration files in the "scripts" directory. The Makefile in the "code" directory can be used to copy these scripts to a new directory, make links to the appropriate executables and data files in this directory, and run the scripts. The user will need to use the supplied "Rmd" to generate figures and documents. Note that these scripts can take hours to run, since they run all the simulations to produce results. The Makefile targets that run scripts are: