CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.13k stars 18.43k forks source link

Automated COVID-19 "Daily Reports" Data Quality Reports for US Data #2585

Open troymartinhughes opened 4 years ago

troymartinhughes commented 4 years ago

Longtime listener, first-time caller... I've been processing the raw Daily Reports (https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports) for a month now and wanted to share a couple data quality reports (i.e., exception reports) that I've created that are automatically generated in Python, and which identify several of the missing/duplicated data issues that I've seen posted here. Both reports are attached and linked in this post.

The first report (https://www.linkedin.com/posts/troy-hughes-27a998a8_covid-19-jhu-daily-reports-data-quality-activity-6671173160479547392-qUWD) evaluates the structure and content of the CSV files, including:

The second report (https://www.linkedin.com/posts/troy-hughes-27a998a8_jhu-daily-reports-covid-19-longitudinal-activity-6672149643671035904-fKAM) evaluates both state-level and county-level data (including cases and deaths) longitudinally (i.e., a between-rows comparison), including:

Appreciate any and all feedback, and as these reports are 100% automated, please advise if anyone would like an updated version. JHU_COVID-19_Daily_Reports_US_Data_Quality_Report_20200528.pdf

JHU_COVID-19_Daily_Reports_US_Longitudinal_Data_Quality_Report_20200528.pdf

Kwagala256 commented 4 years ago

True update