Organize, Document and Identify datasets that could play a role in the analysis.

lossyrob commented 4 years ago

There's a number of datasets that could be helpful with this effort. In order to most effectively keep track of them we should establish a documentation method. This could be a Google Doc or Sheet, a Markdown document, a GitHub issue - whatever makes things easy to add and gives a very quick understanding of what each dataset is at a high level, what it could possible be useful for, and whether or not it's already reviewed or used.

Also, if there's some other effort being used for this purpose, this issue could be satisfied by informing the project on how best to utilize that source.

The goal of this issue is to establish a data documentation method and communicate it to the project so that we can stay organized around the slew of data out there that may or may not be critical to the analysis.

Current data links to be catalogued: 2.1: tracking cases/testing:

2.2: epi modeling:

Healthcare facilities, beds, care utilization, provider data from national, state, county data sources:

NY:

CA:

https://healthdata.gov/dataset/licensed-and-certified-healthcare-facility-bed-types-and-counts

NJ:

https://www.nj.gov/cgi-bin/dhss/healthfacilities/hospitaldisplay.pl?id=10402

MA:

From Gitter:

https://github.com/javieraviles/covidAPI
Does https://qventus.com/blog/predicting-the-effects-of-the-covid-pandemic-on-us-health-system-capacity/ have open data?
https://twitter.com/mlipsitch/status/1239209724836446210 - can't use it as too strong of a signal (for reasons described here: https://twitter.com/cmyeaton/status/1238613507559624709) but may help suggest true initial timing of outbreaks in a region (not the same as when 1st positive test result is made)

Other data cataloging effort: https://coronavirustechhandbook.com/data

lossyrob commented 4 years ago

https://github.com/datasets/covid-19

lossyrob commented 4 years ago

Already used in this project:

US County dataset: https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_20m.json Census data by county: https://www2.census.gov/programs-surveys/popest/datasets/2010-2018/counties/asrh/cc-est2018-alldata.csv

lossyrob commented 4 years ago

Data that contribute to generating these numbers (taken from the Flu Surge 2.0 Model) are high priority:

covidcaremap / covid19-healthsystemcapacity

Organize, Document and Identify datasets that could play a role in the analysis. #8