Lucas-Czarnecki / COVID-19-CLEANED-JHUCSSE

Cleaned daily reports and time series data from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins University for Systems Science and Engineering (JHU CSSE).
12 stars 6 forks source link

NOTICE: Updates and (non-structural) Changes are Coming #2

Closed Lucas-Czarnecki closed 4 years ago

Lucas-Czarnecki commented 4 years ago

Edit: These changes are now in effect.

Starting next week (~April 20th) I will be introducing some changes to the daily report CSVs and cleaned data (i.e., CSSE_DailyReports). Most of these changes are intended to address frequently mentioned issues pertaining to CSSEGISandData's COVID-19 data. The changes WILL NOT affect variable names and SHOULD NOT break anyone's code. The guiding philosophy here is to provide an update that addresses obvious issues while ensuring a minimal amount of change to data structure. Incoming changes are documented below as a heads up.

Daily Reports (CSVs):

Cleaned Data:

Lucas-Czarnecki commented 4 years ago

I have addressed inconsistencies in JHU's older daily reports that contained both states and counties in Province_State (e.g., "Province_State: Los Angeles, CA" ). The cleaned data splits values into Admin2 and Province_State (e.g., "Admin2: Los Angeles" and "Province_State: California"). These changes effectively mean that older daily reports are now consistent with JHU most recent uploads :)

However, JHU used to report on various municipalities before committing to reporting according to FIPS. Therefore, some of the older daily reports will still refer to municipalities (e.g., Boston, Seattle, Chicago) instead of their counties and will therefore not have a FIPS code or other values such as Latitude and Longitude. While some of these cases seem to have an easy fix, I will not make such changes until I am certain that they will not cause any unintended consequences. Keeping the data in its present form may also help find and address serious gaps in JHU's reporting (e.g., see "Suffolk" versus "Suffolk County").

Note that other countries present similar problems. With Canada, for example, JHU used to report data on municipalities/provinces (e.g., Calgary, AB and Edmonton, AB) before committing to provinces (e.g., Alberta). As with US data, I am keeping the data in a format that records JHU's original intentions. Note that if you want to aggregate data on a provincial level you must combine daily cases from cities like Calgary and Edmonton.