This repository contains data and code used in the research for "Estimating infectiousness throughout SARS-CoV-2 infection course" published in Science on May 25, 2021.
The following files are available in the data
directory.
A zip TAB-separated values file containing 879,624 RT-PCR results. The same data is also available in compressed bzip2 format, in viral-load-with-negatives.tsv.bz2.
Each row has 13 fields, as follows
log10Load
: log10 viral load. -1
represents negative tests.Ct
: The RT-PCR cycle thresholdPCR
: Either "LC480" or "T2" accordign to the RT-PCR system used.Date
: The "YYYY-MM-DD" date the sample arrived at the diagnostic
facility.Age
: The subject's age on the day the RT-PCR was done, rounded to one
decimal place (to maintain anonymity)TestCentre
: A secure one-way hash value for the test centre.TestCentreCategory
: The test centre category where the sample was
obtained (see above).Gender
: Either "F" (female), "M" (male), or "U" (unknown)Onset
: Either null or a "YYYY-MM-DD" date of symptom onset.personHash
: A secure one-way hash value for the individual.PAMS1
: True or False, according to whether the first-positive RT-PCR of
the person was done in a walk-in center.Hospitalized
: True or false, according to whether the subject was ever
hospitalised when a positive RT-PCR was obtained.B117
: True or false, indicating whether an infection was from lineage
B.1.1.7Note that a person may have a series of leading negative tests. These are included in the file for the purposes of making Table S1 in the paper, which gives the detection rates for the various test centre categories.
Contains a JSON object with the data from 4344 subjects who had RT-PCR tests on at least three different days (with at least two tests being positive). The main data is "people", a list of 4344 objects, each containing the following attributes:
personHash
: A secure one-way hash value for the individual.gender
: Either "F" (female), "M" (male), or "U" (unknown)PAMS1
: True or False, according to whether the first-positive RT-PCR of
the person was done in a walk-in center.hospitalized
: True or false, according to whether the subject was ever
hospitalised when a positive RT-PCR was obtained.onset
: Either null or a "YYYY-MM-DD" date of symptom onset.B117
: True or false, indicating whether an infection was from lineage
B.1.1.7The following attributes are all lists, containing a value for each RT-PCR result for the person:
viralLoad
: A floating point log10 viral load (copies / swab). A value of 0
indicates a negative test.testName
: Either "LC480" or "T2" accordign to the RT-PCR system used.date
: The "YYYY-MM-DD" date the sample arrived at the diagnostic
facility.testCentre
: A secure one-way hash value for the test centre.testCentreCategory
: The test centre category where the sample was
obtained (see below).age
: The age of the subject on the day the RT-PCR was done. These are
rounded to one decimal place to help ensure anonymity. In the paper we
used the full-precision ages.As in Table S1 in the paper, test centre category abbreviations are as follows:
?: Unknown
AIR: Airport
C19: COVID-19 testing centre
CP: Company physician
ED: Emergency department
FM: Forensic medicine
H: Hospital
ICU: Intensive care unit
IDW: Infectious diseases ward
L: Labor
LW: Labour ward
OD: Outpatient department
PHD: Public health department
PRI: Prison
RES: Aged residence
SM: Sports medicine
WD: Ward
A CSV file with data from the cell culturing trials. The three columns should be self-explanatory.
A CSV file with culture isolation data from the Ranawaka et al. paper SARS-CoV-2 Virus Culture and Subgenomic RNA for Respiratory Specimens from Patients with Mild Coronavirus Disease and the van Kampen et al. paper Duration and key determinants of infectious virus shedding in hospitalized patients with coronavirus disease-2019 (COVID-19). Columns should be self-explanatory.
A CSV file with culture isolation data from the Wölfel et al. paper Virological assessment of hospitalized patients with COVID-2019. Columns should be self-explanatory.
The following files are available.
Contains R Markdown with a description (and the R code) for the statistical analysis.
The R Markdown has been processed into HTML (in ExtendedMethods.html). Click here to view the file in your browser.
Contains various R utility functions.
A Stan model to estimate parameters for a mixture of two normal distributions. This is used to generate "post-processed" posterior predictions for the first-positive RT-PCR test model.
A Stan model to estimate viral load time courses.