covid19-dash / covid-dashboard

Help welcomed if you have expertise in public health web technology, data modeling and munging, or visualization.
https://covid19-dash.github.io/
BSD 3-Clause "New" or "Revised" License
131 stars 41 forks source link

ENH add fetcher for John Hopkins dataset #13

Closed glemaitre closed 4 years ago

glemaitre commented 4 years ago

closes #11

Making a fetcher to replace pycovid. The idea would be to keep the same column name to change the other in a minimal manner.

glemaitre commented 4 years ago

ping @GaelVaroquaux @emmanuelle I think this should be ready at least for testing.

GaelVaroquaux commented 4 years ago

OK, let check out the branch and test. Thanks!!

GaelVaroquaux commented 4 years ago

I have the impression that we are not retrieving South Korea and China (or I am failing to query them). The absence of these countries is a motivation for not depending on pycovid.

Can you check what is going with them?

glemaitre commented 4 years ago

Can you check what is going with them?

I can get china:

In [10]: df[df["name"] == "China"]                                                                                                                                                                 
Out[10]: 
       name       date  cases       type country_region Alpha-2 code alpha-3  Numeric code   lat   long
4293  China 2020-01-22    548  confirmed          China           CN     CHN         156.0  35.0  105.0
4294  China 2020-01-23    643  confirmed          China           CN     CHN         156.0  35.0  105.0
4295  China 2020-01-24    920  confirmed          China           CN     CHN         156.0  35.0  105.0
4296  China 2020-01-25   1406  confirmed          China           CN     CHN         156.0  35.0  105.0
4297  China 2020-01-26   2075  confirmed          China           CN     CHN         156.0  35.0  105.0
...     ...        ...    ...        ...            ...          ...     ...           ...   ...    ...
4447  China 2020-03-10  60181  recovered          China           CN     CHN         156.0  35.0  105.0
4448  China 2020-03-11  61644  recovered          China           CN     CHN         156.0  35.0  105.0
4449  China 2020-03-12  62901  recovered          China           CN     CHN         156.0  35.0  105.0
4450  China 2020-03-13  64196  recovered          China           CN     CHN         156.0  35.0  105.0
4451  China 2020-03-14  65660  recovered          China           CN     CHN         156.0  35.0  105.0

I think this should happen in the get_data?

GaelVaroquaux commented 4 years ago

OK. China is not displayed in the map, though? I don't see it.

GaelVaroquaux commented 4 years ago

Also, a simpler test:

confirmed = data_input.get_data()['confirmed']
confirmed['FRA']
confirmed['CHN']

The query to France works, but not to China.

These above could almost be tests for the code (as in, a beginning of unit-testing).

glemaitre commented 4 years ago

Uhm it works for me

In [1]: from data_input import get_data                                                                                                                                                            

In [2]: df = get_data()                                                                                                                                                                            

In [3]: confirmed = df["confirmed"]                                                                                                                                                                

In [4]: confirmed["CHN"]                                                                                                                                                                           
Out[4]: 
country_region    China
date                   
2020-01-22          548
2020-01-23         1191
2020-01-24         2111
2020-01-25         3517
2020-01-26         5592
2020-01-27         8469
2020-01-28        13978
2020-01-29        20065
2020-01-30        28206
2020-01-31        38008
2020-02-01        49899
2020-02-02        66529
2020-02-03        86245
2020-02-04       109952
2020-02-05       137392
2020-02-06       167979
2020-02-07       202089
2020-02-08       238903
2020-02-09       278732
2020-02-10       321086
2020-02-11       365472
2020-02-12       410231
2020-02-13       470126
2020-02-14       536484
2020-02-15       604897
2020-02-16       675410
2020-02-17       747844
2020-02-18       822055
2020-02-19       896674
2020-02-20       971751
2020-02-21      1047301
2020-02-22      1124302
2020-02-23      1201324
2020-02-24      1278565
2020-02-25      1356319
2020-02-26      1434485
2020-02-27      1513085
2020-02-28      1592013
2020-02-29      1671369
2020-03-01      1751301
2020-03-02      1831437
2020-03-03      1911698
2020-03-04      1992084
2020-03-05      2072621
2020-03-06      2153311
2020-03-07      2234081
2020-03-08      2314904
2020-03-09      2395764
2020-03-10      2476651
2020-03-11      2557572
2020-03-12      2638504
2020-03-13      2719449
2020-03-14      2800426
GaelVaroquaux commented 4 years ago

OK, I probably had forgotten to restart my kernel when moving to your branch. It works indeed. Sorry!!

GaelVaroquaux commented 4 years ago

I can confirm that the input is now fixed. Merging as this is a net improvement. Thank you!!