covid19-dash / covid-dashboard

Help welcomed if you have expertise in public health web technology, data modeling and munging, or visualization.
https://covid19-dash.github.io/
BSD 3-Clause "New" or "Revised" License
135 stars 41 forks source link

Grab data directly from John Hopkins #11

Closed GaelVaroquaux closed 4 years ago

GaelVaroquaux commented 4 years ago

We are using pycovid so far. However, it is itself pulling from an R package which mirrors a tidied version of the John Hopkins data. As a consequence, we are lagging behind.

The biggest added value of pycovid is the merge with the ISO country code (which we need for plotting on the map). I believe that this merge is currently broken for China: https://github.com/sudharshan-ashok/pycovid/issues/2

I suggest that we pull directly from John Hopkins: they are hosting a version of the data which is formatted close to what we need: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

This would entail changing the logic in https://github.com/covid19-dash/covid-dashboard/blob/master/data_input.py

The hardest work would be to extract the ISO country codes from https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/slim-3/slim-3.csv

glemaitre commented 4 years ago

I will give a stab to this

GaelVaroquaux commented 4 years ago

Thanks!

I was looking at it, and this is what I had started writing:

import pandas as pd

URL = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-{type}.csv'

TYPES = 'Recovered', 'Deaths', 'Recovered'

data = pd.read_csv(URL.format(type=TYPES[0]))
glemaitre commented 4 years ago

We need some people to work on dirty categories because I am matching country names which do not coincide for some reason :)