covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

scraped NY Times data flawed #497

Open hannahklauber opened 4 years ago

hannahklauber commented 4 years ago

US county data differ from those in the New York Times source file (https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv).

E.g. Providence County, Rhode Island: 2020-04-29 - 3431 cases in your data 2020-04-29 - 5967 cases in NY Times data

I don't know if other counties are affected as well.

jzohrab commented 4 years ago

Thanks for the issue!

We scrape multiple sources and cross check them. It’s possible that another source took precedence over the NYT one.

Is this still occurring? Cheers! Jz

El El jue, may. 7, 2020 a la(s) 10:44 a. m., hannahklauber < notifications@github.com> escribió:

US county data differ from those in the New York Times source file ( https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv ).

E.g. Providence County, Rhode Island: 2020-04-29 - 3431 cases in your data 2020-04-29 - 5967 cases in NY Times data

I don't know if other counties are affected as well.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/covidatlas/coronadatascraper/issues/978, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPWDOET2EEZQD26UT4DYDRQLCLPANCNFSM4M3MM2OQ .

hannahklauber commented 4 years ago

Thank you for building up this great database!

The issue is still occurring.

Best, Hannah

lazd commented 4 years ago

It's possible that RI is reporting current, not cumulative? Because their own website very clearly says 3,913 for Providence... which is less than yesterday, wtf? https://ri-department-of-health-covid-19-data-rihealth.hub.arcgis.com/

lazd commented 4 years ago

Reached out to RI, they said;

Good morning,

Thank you for reaching out. The data is updated every day and cumulative.

Best,

Isabella COVID-19 Joint Information Center

With that, it does seem that NYT and JHU are wrong, or are counting data differently somehow... Maybe it has to do with RI reporting at a city level for some places?

jzohrab commented 4 years ago

This is a common NYT problem ... they're counting higher for other locations too. Perhaps we shouldn't use them.