covid19-dash / covid-dashboard

Help welcomed if you have expertise in public health web technology, data modeling and munging, or visualization.
https://covid19-dash.github.io/
BSD 3-Clause "New" or "Revised" License
134 stars 41 forks source link

Process data at a finer spatial granularity #58

Open GaelVaroquaux opened 4 years ago

GaelVaroquaux commented 4 years ago

The data from John Hopkins comes at the level of regions / province. We should ideally build the map at this level. Forecasting at this level would also be interesting, provided that there are enough cases (forecasting from few cases is unreliable).

This enhancement will require some work, but it seems a worthwhile addition to the site.

GaelVaroquaux commented 4 years ago

I think that the challenge is plotting on the map: we need to get the shape of each region / province. Maybe it's a geojson?

Here is a discussion on US states in a world map: https://community.plot.ly/t/state-boundaries-on-a-world-map-projection/11698/4

Based on the following documentation, the only predefined geometries are the countries and the US states: https://plot.ly/python/choropleth-maps/#using-builtin-country-and-state-geometries

emmanuelle commented 4 years ago

Currently taking a look at http://www.naturalearthdata.com/downloads/110m-cultural-vectors/ @jorisvandenbossche do you think it's a good resource or would you rather recommend another resource ? (sorry for the ping!)

emmanuelle commented 4 years ago

if we want a quick solution, what could be done would be to use the Lat / Lon info of the dataset to plot a scatter plot at each lat / lon tuple. no need for shape files there. Of course having the shapes is nicer but it's also more data for the whole page

jorisvandenbossche commented 4 years ago

Natural Earth indeed has States/Provinces, but the question will still be if that matches the regions as provided in the data. Is there an example of the data?

emmanuelle commented 4 years ago

Thanks a lot for your input @jorisvandenbossche :-). An example of dataset is https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

jorisvandenbossche commented 4 years ago

Thanks for that link. So https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-1-states-provinces/ has states and provinces shapes. I can take a look tomorrow if it is relatively straightforward to match those. But eg the COVID data for the US even come per county, not per state (although it should be easy to aggregate those per state)

emmanuelle commented 4 years ago

Thanks for taking a look. The Johns Hopkins dataset (which we are using at the moment) only has province / state information for a handful of countries (it might change in the future)

>>> countries = df['Country/Region']
>>> countries.value_counts()[:50]                                                                                               
US                     247
China                   33
Canada                  12
France                   9
Australia                9
United Kingdom           7
Netherlands              4
Denmark                  3
Japan                    1

So for now this correspondance must be checked for 8 countries. I'll try it with the US states (county-level information is great but state-level should be fine for now), since plotly's choropleth trace already knows the geometry of US states.

emmanuelle commented 4 years ago

Related to this: #79 . We can assume that county-based data are incomplete or not reliable, so let's not use them and focus on state-level data for the US.

emmanuelle commented 4 years ago

Also see https://github.com/CSSEGISandData/COVID-19/issues/1250

emmanuelle commented 4 years ago

In fact, regions info are only useful for Canada, Australia and China. For the other countries, regions correspond to overseas territories.