Closed GaelVaroquaux closed 4 years ago
I will give a stab to this
Thanks!
I was looking at it, and this is what I had started writing:
import pandas as pd URL = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-{type}.csv' TYPES = 'Recovered', 'Deaths', 'Recovered' data = pd.read_csv(URL.format(type=TYPES[0]))
We need some people to work on dirty categories because I am matching country names which do not coincide for some reason :)
We are using pycovid so far. However, it is itself pulling from an R package which mirrors a tidied version of the John Hopkins data. As a consequence, we are lagging behind.
The biggest added value of pycovid is the merge with the ISO country code (which we need for plotting on the map). I believe that this merge is currently broken for China: https://github.com/sudharshan-ashok/pycovid/issues/2
I suggest that we pull directly from John Hopkins: they are hosting a version of the data which is formatted close to what we need: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
This would entail changing the logic in https://github.com/covid19-dash/covid-dashboard/blob/master/data_input.py
The hardest work would be to extract the ISO country codes from https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/slim-3/slim-3.csv