datasets / covid-19

Novel Coronavirus 2019 time series data on cases
https://datahub.io/core/covid-19
1.16k stars 604 forks source link

Refactor data in us_* datapackages #85

Open repentsinner opened 4 years ago

repentsinner commented 4 years ago

As a user of the covid-19 datapackage, I want to be able to be able to more easily calculate incidence rates of confirmed cases.

Currently the us_*.csv files include lat/long info (raised in #1) as well as a variety of other identifiers that don't appear to ever change with the time series.

In addition, the us_deaths file contains a population field which can be helpful to calculate incidence rates rather than absolute counts, but the us_confirmed file is missing this population field.

It would be great if there was a us_counties file that used the same UID or FIPS/INCITS 38:2009 to provide this non-changing data in a more uniform way for further processing.

Note: it appears as though this issue is due to directly re-packaging the CSSE data as a datapackage, rather than being opinionated about how that data might usefully be presented/consumed.

Thanks for considering!