Closed Lucas-Czarnecki closed 4 years ago
I have addressed inconsistencies in JHU's older daily reports that contained both states and counties in Province_State
(e.g., "Province_State: Los Angeles, CA" ). The cleaned data splits values into Admin2
and Province_State
(e.g., "Admin2: Los Angeles" and "Province_State: California"). These changes effectively mean that older daily reports are now consistent with JHU most recent uploads :)
However, JHU used to report on various municipalities before committing to reporting according to FIPS
. Therefore, some of the older daily reports will still refer to municipalities (e.g., Boston, Seattle, Chicago) instead of their counties and will therefore not have a FIPS
code or other values such as Latitude
and Longitude
. While some of these cases seem to have an easy fix, I will not make such changes until I am certain that they will not cause any unintended consequences. Keeping the data in its present form may also help find and address serious gaps in JHU's reporting (e.g., see "Suffolk" versus "Suffolk County").
Note that other countries present similar problems. With Canada, for example, JHU used to report data on municipalities/provinces (e.g., Calgary, AB and Edmonton, AB) before committing to provinces (e.g., Alberta). As with US data, I am keeping the data in a format that records JHU's original intentions. Note that if you want to aggregate data on a provincial level you must combine daily cases from cities like Calgary and Edmonton.
Edit: These changes are now in effect.
Starting next week (~April 20th) I will be introducing some changes to the daily report CSVs and cleaned data (i.e., CSSE_DailyReports). Most of these changes are intended to address frequently mentioned issues pertaining to CSSEGISandData's COVID-19 data. The changes WILL NOT affect variable names and SHOULD NOT break anyone's code. The guiding philosophy here is to provide an update that addresses obvious issues while ensuring a minimal amount of change to data structure. Incoming changes are documented below as a heads up.
Daily Reports (CSVs):
Active
cases will be recalculated (i.e., Active = Confirmed - Deaths - Recoveries ) to correct for errors and to replace missing values in older daily reports. A sanity check will also ensure that active cases are no fewer than zero; cases where JHU reports negative active cases will be reported as missing values.Country_Region
andProvince_State
such that each location will have a unique name. For example, "Korea, South", and "Republic of Korea" will become "South Korea" across all CSVs.Province_State
such as values referring to provinces and states alongside cities and counties (e.g., "Los Angeles, CA"). For US data these values will be split intoAdmin2
(e.g., "Los Angeles) andProvince_State
(e.g., California).Combined_Key
will be provided that addresses various inconsistencies (e.g., "France" and ",,France").Latitude
andLongitude
will be matched to regions, replacing missing values for older daily reports and ensuring that coordinates are consistent for each region (addressing known issues with countries having conflicting coordinates).FIPS
codes in JHU's Lookup Table will be fixed (to address known issues pertaining to leading zeros) and then mapped to daily reports.Cleaned Data: