CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.13k stars 18.43k forks source link

US total cases data from 3/10 are internally inconsistent. #496

Open SteveForsythe opened 4 years ago

SteveForsythe commented 4 years ago

I noticed that for Washington State that the detailed location data don't sum to the statewide count. Further, the statewide count is formatted like the detail data (one column for each date) but it doesn't tally with the local counts by day (there is only one non-zero entry for the statewide (the last day), not just a sum of detail locations).

FrankSchiro commented 4 years ago

I'm having the same issue with total cases in NY, I see 186 confirmed on the dashboard, and I know for a fact the dashboard was in the 170's for confirmed last night. (https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6)

But summing up everything with NY in Province.State gives 150, using the data from time_series_19-covid-Confirmed.cs

This summation method works for previous dates though, only the 10th shows this issue.

TravSplk commented 4 years ago

I wondered about the sudden jump in cases, this is problematic considering the roll-ups don't match county totals for the same state. My guess is the state level rollups are the most accurate at the moment.

JeffLoucksPersonal commented 4 years ago

I see the other comments here. There is a challenge with inserting duplicate data into a stream that never contained it before. Reporting both at the state level and the city level in the same column is bound to cause problems. It has for me.

mbevand commented 4 years ago

CSSE appears to double count cases and deaths. For example in https://github.com/CSSEGISandData/COVID-19/blob/e69d4ce27fd320b1b8aaadab7a74717adb755f45/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv here is an excerpt showing only 3 columns: Province/State, Country/Region, deaths as of 3/10:

Clark County, WA        US      0
Jefferson County, WA    US      0
Kitsap, WA      US      0
Kittitas County, WA     US      0
Pierce County, WA       US      0
Washington County, OR   US      0
Washington, D.C.        US      0
Grant County, WA        US      1
Snohomish County, WA    US      1
King County, WA US      21
Washington      US      23

If you add up all the deaths, you'll find 46 instead of 23 because the rows with counties/cities data double count the row for "Washington". So in my scripts I ended up ignoring all US rows where Province/State appears to be a city or county.