ecomore2 / pacs

Cleaning and reshaping PACS data from Institut Pasteur du Laos
https://ecomore2.github.io/pacs
0 stars 0 forks source link
data data-cleaning

PACS data

PACS (Pathogen Asset Control System) is the name of the data system used at Institut Pasteur du Laos. In the context of the Ecomore2 project, PACS is the source of epidemiological data. It basically contains one line per sample (here, in most of cases, a sample corresponds to a case) with age, gender, time, space, confirmation test and serotype information.

Because the PACS system was adopted after the surveillance had already started, and because all the data collected prior PACS adoption have not been all entered in the PACS system yet, the data are in several files in the raw_data/IPL PACS folder of the DropBox Ecomore2 folder:

In the cleaning pipeline, these files are loaded into the prepacs (pre-PACS.xlsx) and postpacs (PACS.xls and pacs ID 7464-ID8292_2018-10-02.xlsx) data frames and binded together in the pacs data frame. The data are patched with corrections:

The cleaned and reshaped data are saved to the data/pacs.csv CSV file that can be copied and paste to a text file on your computer or downloaded directly from R into a data frame:

if (! "readr" %in% rownames(installed.packages())) install.packages("readr")
pacs <- readr::read_csv("https://raw.githubusercontent.com/ecomore2/pacs/master/data/pacs.csv",
                        col_types = paste(c("icfnD", rep("c", 5), rep("D", 4), rep("f", 3)), collapse = ""))

The variables names are meaningful in themselves. A case will be considered as confirmed if at least one of the pcr and ns1 variables is positive.

This summary provides a real-time overview of the current state of the PACS data set, highlighting problems that remain to be fixed.