covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Timeseries file does not distinguish New York State and New York City #399

Closed jzohrab closed 4 years ago

jzohrab commented 4 years ago

Original issue https://github.com/covidatlas/coronadatascraper/issues/469, transferred here on Saturday Mar 28, 2020 at 21:28 GMT


Today's (28 March) version of the timeseries file doesn't distinguish data for New York State vs. New York City. Here are the two rows for yesterday (27 March) that match state == 'NY' and is.na(county) :

city county state country population lat long url cases deaths recovered active tested growthFactor date
NA NA NY USA 19453561 42.76081 -75.84097 https://coronavirus.health.ny.gov/county-county-breakdown-positive-cases 44635 NA NA NA NA 1.197998 2020-03-27
NA NA NY USA 8398748 40.70684 -73.97834 https://coronavirus.health.ny.gov/county-county-breakdown-positive-cases 25398 NA NA NA NA 1.187211 2020-03-27

Based on population, the top row is for the entire state and the second is for NYC. Perhaps someone was trying to fix #399 and accidentally set city to NA?

jzohrab commented 4 years ago

(Transferred comment)

The snapshot file does distinguish NYC and NYS. NYS is NA for both city and county; NYC is NA for county and has city == 'New York City'.

jzohrab commented 4 years ago

(Transferred comment)

Hmmm, I don't think this is an issue any longer. Can you verify?

jzohrab commented 4 years ago

(Transferred comment)

I've switched to a different data source. Based on a quick check, it seems like the NYC rows now have city set to 'New York City'. So this issue seems to have been fixed. The NYC rows seem to be missing population, though.