ecohealthalliance / eidith-cleaning-reports

0 stars 2 forks source link

CircleCI Build Status

PREDICT data cleaning reports

This repository contains R code to generate reports to support cleaning EIDITH data.

Currently, reports for sites, site modules, animals, specimens, humans and human modules are implemented.

To use, clone this repository and set it as your working directory. (You may open the eidith-cleaning-reports.Rproj file if you use RStudio.)

Run devtools::install_deps(upgrade = "always") in R to get all packages required to run this code and ensure that they are up to date. These packages are listed in the DESCRIPTION file. You should in general keep up to date with the latest version of the eidith package, which tracks changes made upstream in the EIDITH database.

Run the 00-get-eidith-data.R script to download EIDITH data into the (currently empty) raw-eidith-data/ folder. Data for all countries that you have access to will downloaded. This requires that you set EIDITH_USERNAME and EIDITH_PASSWORD environment variables. See ?eidith::ed_auth in the eidith R package for details. You can limit the countries' data you download data for by setting the "country" argument in that script.

Modify the make.R script to specify for which countries to generate reports. Then run script to generate the reports.

For each country, there will be an HTML report that summarizes unique values, empty values, and ranges for variables. For each country there will also be a Microsoft Excel workbook with cells flagged being unique, empty, or otherwise requiring inspection to see if they are correct. More details on these outputs are in the first section of the HTML reports themselves, or you can find them in the report-template.Rmd file.

Please address questions related to data cleaning to technology@eidith.org. For questions specific to this code, please file an issue in the GitHub repository.

*.encrypted, dropbox_upload.R, .gitlab-ci.yml, and files under the .circleci/ directory are specific to EcoHealth Alliance's automated pipeline infrastructure and require EHA encryption keys to use.