Closed cliffckerr closed 4 years ago
https://github.com/CSSEGISandData/COVID-19 is best dataset I know of right now
Agree -- can we get that in a form ingestible by https://github.com/InstituteforDiseaseModeling/covasim/blob/develop/covasim/parameters.py#L109 ?
Do we have anything other then code, that defines the cols we want and what they should be called?
@gwincr11 Format should like (xlsx or csv format):
day | date | new_positives | new_negatives | new_tests | new_hosp | new_icu | new_death
0 | 2/26/2020 | 0 | 2 | 2 | 0 | 0 | 0
1 | 2/27/2020 | 1 | 0 | 1 | 0 | 0 | 0
2 | 2/28/2020 | 0 | 1 | 1 | 0 | 0 | 0
3 | 2/29/2020 | 1 | 11 | 12 | 1 | 0 | 0
4 | 3/1/2020 | 1 | 5 | 6 | 0 | 0 | 0
5 | 3/2/2020 | 0 | 16 | 16 | 0 | 0 | 0
6 | 3/3/2020 | 0 | 23 | 23 | 0 | 0 | 0
7 | 3/4/2020 | 0 | 16 | 16 | 0 | 0 | 0
8 | 3/5/2020 | 3 | 39 | 42 | 0 | 0 | 0
9 | 3/6/2020 | 6 | 34 | 40 | 4 | 2 | 0
10 | 3/7/2020 | 4 | 55 | 59 | 2 | 0 | 0
11 | 3/8/2020 | 1 | 41 | 42 | 1 | 1 | 0
12 | 3/9/2020 | 5 | 95 | 100 | 1 | 0 | 0
let's add region to this too, so we can select per-country and per-state
To me, it looks like the best, shared resource is https://github.com/covidatlas/coronadatascraper which is collecting and validating on coronavirus data; any new data sources we find might be usefully brought through them.
Their time series data has the following columns (in addition to geolocation and sourcing data):
I am working on a (pandas-based) script to create 'new_x' columns (new_death
, etc) and day
columns.
But, this particular data will not have provide new_negatives
, new_hosp
or new_icu
data.
As of 4/4, it has data from 179 countries, 336 state-level divisions, 3078 county-level divisions, and 39 cities.
I will have the 4/4 data available later today, and the conversion script as well.
Let me know if any of this is problematic.
Closed by @willf (#58 )
This is an involved project and may even require its own repo, but creating an issue here to get the conversation started. The task is:
We need the best available auto-updated epidemiological data at as fine a geographical resolution as possible.
Specifically, the data we need is as many of the following as possible, in order of importance:
There are various tools that already collate some of this, e.g. https://neherlab.org/covid19/ and https://coronavirus.jhu.edu/map.html. The task is to find the best available data sources and collate everything into a consistent format. Top priority is Africa and LMIC countries, but as broad as coverage as possible.