ExpDev07 / coronavirus-tracker-api

🦠 A simple and fast (< 200ms) API for tracking the global coronavirus (COVID-19, SARS-CoV-2) outbreak. It's written in python using the 🔥 FastAPI framework. Supports multiple sources!
https://coronavirus-tracker-api.herokuapp.com
GNU General Public License v3.0
1.59k stars 320 forks source link

Incorrect US Data #25

Closed mikebarton23 closed 4 years ago

mikebarton23 commented 4 years ago

Seems like this is likely on Johns Hopkins side but I'm seeing inconsistencies in day-to-day data for the U.S. It looks like maybe they've decided to aggregate by state now in addition to by county and some of the original county data was left in. For example, "Washington" had 0 cases on 3/9/2020 but has 267 listed on 3/10/2020. However, there is still data from counties within Washington being counted. That leads to double counting in some scenarios -- the 3/10/2020 count of confirmed cases in the US comes out to 1,670 using these new numbers which is off by quite a bit.

Doubt there's anything you can do here but thought I'd bring it to your attention.

Attaching a sheet I made that shows the largest discrepancies.

JHU Data Errors.xlsx

ralyodio commented 4 years ago

https://lionbridge.ai/datasets/coronavirus-datasets-from-every-country/

Maybe we need a new data set since JHU is unreliable now.

ExpDev07 commented 4 years ago

I wouldn’t exactly call them unreliable. Many major news outlets (including local and national ones in my country) are still quoting their data. There’s probably issues open right now that addresses this.

mikebarton23 commented 4 years ago

I checked out the JHU GitHub page and found an announcement about data formats moving forward: Issue #504

Essentially, they're aggregating by state now but still kept some of the city data in there which led to lots of double counting. I got around this on the historical side by taking the last two letters of the city (e.g. Columbia, SC) and joining that up to a table I created with state codes and names. Then, I aggregated all of the states' data from 3/9 moving backward in time. Helped to give some continuity to the time series data. A huge pain but thought I'd share my solution.

ExpDev07 commented 4 years ago

Assuming that they’re still working on it, it’ll get fixed soon and show the correct data.

mikebarton23 commented 4 years ago

It didn't necessarily sound like they were planning on fixing it based on what I read. But your API is working as intended so it's nothing on your end.

Really appreciate you building out this API, by the way. It has been a huge help!