VirologyCharite / SARS-CoV-2-VL-paper

Data and files from the May 25, 2021 SARS-CoV-2 viral load and infectiousness paper
9 stars 2 forks source link

DOI

Estimating infectiousness throughout SARS-CoV-2 infection course

This repository contains data and code used in the research for "Estimating infectiousness throughout SARS-CoV-2 infection course" published in Science on May 25, 2021.

Data

The following files are available in the data directory.

viral-load-with-negatives.tsv.zip

A zip TAB-separated values file containing 879,624 RT-PCR results. The same data is also available in compressed bzip2 format, in viral-load-with-negatives.tsv.bz2.

Each row has 13 fields, as follows

  1. log10Load: log10 viral load. -1 represents negative tests.
  2. Ct: The RT-PCR cycle threshold
  3. PCR: Either "LC480" or "T2" accordign to the RT-PCR system used.
  4. Date: The "YYYY-MM-DD" date the sample arrived at the diagnostic facility.
  5. Age: The subject's age on the day the RT-PCR was done, rounded to one decimal place (to maintain anonymity)
  6. TestCentre: A secure one-way hash value for the test centre.
  7. TestCentreCategory: The test centre category where the sample was obtained (see above).
  8. Gender: Either "F" (female), "M" (male), or "U" (unknown)
  9. Onset: Either null or a "YYYY-MM-DD" date of symptom onset.
  10. personHash: A secure one-way hash value for the individual.
  11. PAMS1: True or False, according to whether the first-positive RT-PCR of the person was done in a walk-in center.
  12. Hospitalized: True or false, according to whether the subject was ever hospitalised when a positive RT-PCR was obtained.
  13. B117: True or false, indicating whether an infection was from lineage B.1.1.7

Note that a person may have a series of leading negative tests. These are included in the file for the purposes of making Table S1 in the paper, which gives the detection rates for the various test centre categories.

min-3-timeseries.json

Contains a JSON object with the data from 4344 subjects who had RT-PCR tests on at least three different days (with at least two tests being positive). The main data is "people", a list of 4344 objects, each containing the following attributes:

The following attributes are all lists, containing a value for each RT-PCR result for the person:

Test centre categories

As in Table S1 in the paper, test centre category abbreviations are as follows:

?: Unknown
AIR: Airport
C19: COVID-19 testing centre
CP: Company physician
ED: Emergency department
FM: Forensic medicine
H: Hospital
ICU: Intensive care unit
IDW: Infectious diseases ward
L: Labor
LW: Labour ward
OD: Outpatient department
PHD: Public health department
PRI: Prison
RES: Aged residence
SM: Sports medicine
WD: Ward

Culture_probability_data_B.1.1.7.csv

A CSV file with data from the cell culturing trials. The three columns should be self-explanatory.

Culture_probability_data_wild_type.csv

A CSV file with culture isolation data from the Ranawaka et al. paper SARS-CoV-2 Virus Culture and Subgenomic RNA for Respiratory Specimens from Patients with Mild Coronavirus Disease and the van Kampen et al. paper Duration and key determinants of infectious virus shedding in hospitalized patients with coronavirus disease-2019 (COVID-19). Columns should be self-explanatory.

Culture_probability_data_wild_type_woelfel.csv

A CSV file with culture isolation data from the Wölfel et al. paper Virological assessment of hospitalized patients with COVID-2019. Columns should be self-explanatory.

Code

The following files are available.

ExtendedMethods.Rmd

Contains R Markdown with a description (and the R code) for the statistical analysis.

The R Markdown has been processed into HTML (in ExtendedMethods.html). Click here to view the file in your browser.

utils.R

Contains various R utility functions.

FPT/mix_s_all.stan

A Stan model to estimate parameters for a mixture of two normal distributions. This is used to generate "post-processed" posterior predictions for the first-positive RT-PCR test model.

CP/stan/TC_APGHB117_simple_shiftw25o.stan

A Stan model to estimate viral load time courses.