geoffroychaussonnet / script_to_monitor_Covid19

Python scripts to monitor Covid-19
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Use confinement.dat #13

Open jferard opened 4 years ago

jferard commented 4 years ago

As soon as #10 is fixed, we should use the data parsed.

jferard commented 4 years ago

I'm working on it, but it's not easy to choose a confinement date. I have written a function that extract the "most important date", using the following criteria:

The idea is that the first total confinement date or the last partial date is the most important. What is your opinion?


The dates hard-coded in the scripts are:

{'Belgium': '3/18/20', # same
 'Canada': '5/22/20', # not in confinement.dat
 'China': '1/22/22', # not in confinement.dat
 'Denmark': '3/13/20', # not in confinement.dat
 'EU': '3/22/21',
 'European continent': '3/22/21',
 'Finland': '3/19/20', # not in confinement.dat
 'France': '3/17/20',  # samz
 'Germany': '3/19/20',  # not in confinement.dat
 'Iran': '8/17/20',  # not in confinement.dat
 'Ireland': '3/28/20', # same
 'Italy': '3/9/20', # same
 'Japan': '5/22/20', # not in confinement.dat
 'Korea, South': '5/22/20', # not in confinement.dat
 'Norway': '3/12/20', # same
 'Spain': '3/14/20', # same
 'Sweden': '3/28/20', # not in confinement.dat
 'Switzerland': '5/22/20', # not in confinement.dat
 'US': '3/22/20', # not in confinement.dat
 'United Kingdom': '3/22/20', # 4/24/20 in confinement.dat
 'World': '3/22/21'}

While the date extracted from confinement.dat are:

{'Angola': '3/28/20',
 'Argentina': '3/19/20',
 'Austria': '3/15/20',
 'Bangladesh': '3/26/20',
 'Belgium': '3/18/20',
 'Bolivia': '3/22/20',
 'Botswana': '3/22/20',
 'Columbia': '3/24/20',
 'Cuba': '3/24/20',
 'Czech': '3/13/20',
 'France': '3/17/20',
 'Greece': '3/23/20',
 'India': '3/24/20',
 'Irak': '3/22/20',
 'Ireland': '3/28/20',
 'Italy': '3/9/20',
 'Kenya': '3/23/20',
 'Laos': '3/30/20',
 'Lesotho': '3/28/20',
 'Malaysia': '3/28/20',
 'Maroco': '3/21/20',
 'Mexico': '3/30/20',
 'Nepal': '3/24/20',
 'New Zealand': '3/23/20',
 'Norway': '3/12/20',
 'Ouganfa': '3/30/20',
 'Peru': '3/16/20',
 'Portugal': '3/19/20',
 'Romania': '3/25/20',
 'South Africa': '3/27/20',
 'Spain': '3/14/20',
 'Tunisia': '3/20/20',
 'UK': '3/24/20',
 'Ukraine': '3/17/20',
 'United Arab Emirates': '3/31/20',
 'Venezuela': '3/17/20',
 'Vietnam': '4/1/20',
 'Zimbabwe': '3/30/20'}
jferard commented 4 years ago

Another source ios the table in: https://en.wikipedia.org/wiki/National_responses_to_the_2019%E2%80%9320_coronavirus_pandemic#In_other_countries. It does not make a difference between total/partial confinement, but seems up to date.

geoffroychaussonnet commented 4 years ago

The idea is that the first total confinement date or the last partial date is the most important. What is your opinion?

I think it depends on what we want to do with the confinement date:

Concerning the wikipedia page for the source, it might be a more robust and better accepted source that the article from Le Monde. Also, as it is a wikipedia page, it is very likely that it'll be updated in the future. I found out that there are already tools developed in Python to retrieve data from wikipedia arrays. How wonderful !

https://stackoverflow.com/questions/50355577/scraping-wikipedia-tables-with-python-selectively

jferard commented 4 years ago

A 100% quick and dirty script adapted from the link you provided:

import requests
from bs4 import BeautifulSoup
import re
import csv

URL = "https://en.wikipedia.org/wiki/National_responses_to_the_2019%E2%80%9320_coronavirus_pandemic"

res = requests.get(URL).text
soup = BeautifulSoup(res,'lxml')
table = soup.find_all('table', class_='wikitable')[1]
rows = list(table.find_all('tr'))

width = 0
for row in rows:
   cells = row.find_all(['th','td'])
   w = sum(c.get('cellspan', 1) for c in cells)
   if w > width:
       width = w

L = [[None for _ in range(width)] for _ in rows]

for i, row in enumerate(rows):
    cells = row.find_all(['th','td'])
    j = 0
    for c in cells:
        while L[i][j] is not None:
            j += 1
        m = re.match("^\s*(.*?)(\[\d+\])*\s*$", c.text)
        text = m.group(1)
        cs = int(c.get('colspan', 1))
        rs = int(c.get('rowspan', 1))
        for m in range(i, i+rs):
            L[m][j] = text
            for k in range(j+1, j+cs):
                L[m][k] = ''
        j += cs

with open("quarantines.csv", "w", newline="") as f:
    w = csv.writer(f)
    for row in L[1:-1]:
        w.writerow(row)

Ouput:


Country,Place,Start date,End date,Level
Albania,,2020-03-13,,National
Algeria,Algiers,2020-03-23,2020-04-19,City
Algeria,Blida,2020-03-23,2020-04-19,City
Argentina,,2020-03-19,2020-04-26,National
Australia,,2020-03-23,,National
Austria,,2020-03-16,2020-04-13,National
Azerbaijan,,2020-03-31,2020-04-20,National
Bangladesh,,2020-03-26,2020-04-25,National
Belgium,,2020-03-18,2020-04-19,National
Bolivia,,2020-03-22,2020-04-15,National
Botswana,,2020-04-02,2020-04-30,National
Brazil,Santa Catarina,2020-03-17,2020-04-07,State
Brazil,São Paulo,2020-03-24,2020-04-08,State
Chile,,2020-03-19,,National
Colombia,,2020-03-25,2020-04-13,National
Republic of the Congo,,2020-03-31,2020-04-20,National
Costa Rica,,2020-03-23,,National
Croatia,Murter,2020-03-25,,Municipality
Croatia,Rest of area,2020-03-18,,National
Cuba,,2020-03-23,2020-04-20,National
Czech Republic,,2020-03-16,2020-04-12,National
Denmark,,2020-03-11,2020-04-13,National
Dominican Republic,,2020-03-19,,National
Ecuador,,2020-03-16,2020-03-31,National
El Salvador,,2020-03-12,2020-04-02,National
Eritrea,,2020-04-02,2020-04-23,National
Fiji,Lautoka,2020-03-20,2020-04-07,City
Fiji,Suva,2020-04-03,2020-04-17,City
Finland,Uusimaa,2020-03-27,2020-04-16,Region
France,,2020-03-17,2020-04-15,National
Germany,Bavaria,2020-03-20,2020-04-19,State
Germany,"Freiburg, BW",2020-03-21,2020-04-03,City
Ghana,Accra,2020-03-30,2020-04-12,Metropolitan Area
Ghana,Kumasi,2020-03-30,2020-04-12,Metropolitan Area
Greece,,2020-03-23,2020-04-27,National
Honduras,Central District,2020-03-16,,Municipality
Honduras,La Ceiba,2020-03-16,,Municipality
Honduras,Choluteca,2020-03-16,,Municipality
Honduras,San Pedro Sula,2020-03-17,,Municipality
Honduras,Rest of area,2020-03-20,2020-04-19,National
Hungary,,2020-03-28,2020-04-10,National
India,,2020-03-25,2020-04-30,National
Indonesia,Jakarta[a],2020-04-10,2020-04-23,Province
Indonesia,Tegal,2020-03-26,2020-07-31,City
Iraq,,2020-03-22,2020-04-11,National
Ireland,,2020-03-12,2020-05-05,National
Israel,Bnei Brak,2020-04-02,,City
Italy,,2020-03-09,2020-05-03,National
Japan,Chiba,2020-04-07,2020-05-06,Prefecture
Japan,Fukuoka,2020-04-07,2020-05-06,Prefecture
Japan,Hyōgo,2020-04-07,2020-05-06,Prefecture
Japan,Kanagawa,2020-04-07,2020-05-06,Prefecture
Japan,Osaka,2020-04-07,2020-05-06,Prefecture
Japan,Saitama,2020-04-07,2020-05-06,Prefecture
Japan,Tokyo[b],2020-04-07,2020-05-06,Prefecture
Jordan,,2020-03-21,,National
Kuwait,,2020-03-14,2020-03-29,National
Lebanon,,2020-03-15,2020-03-28,National
Liberia,Margibi,2020-03-23,2020-04-11,County
Liberia,Montserrado,2020-03-23,2020-04-11,County
Libya,,2020-03-22,,National
Lithuania,,2020-03-16,2020-04-27,National
Luxembourg,,2020-03-18,,National
Madagascar,Antananarivo,2020-03-23,,City
Madagascar,Toamasina,2020-03-23,,City
Malaysia,,2020-03-18,2020-04-28,National
Montenegro,Tuzi,2020-03-24,,Municipality
Morocco,,2020-03-19,2020-04-20,National
Nepal,,2020-03-24,2020-04-15,National
Netherlands,,2020-03-16,2020-04-28,National
New Zealand,,2020-03-26,,National
Nigeria,Abuja,2020-03-30,2020-04-12,City
Nigeria,Lagos,2020-03-30,2020-04-12,City
Nigeria,Ogun,2020-03-30,2020-04-12,State
Northern Cyprus,,2020-03-30,,National
Norway,,2020-03-12,2020-04-13,National
Oman,Muscat,2020-04-10,2020-04-22,Region
Pakistan,Azad Kashmir,2020-03-24,2020-04-14,Administrative
Pakistan,Punjab,2020-03-24,2020-04-14,Province
Pakistan,Sindh,2020-03-24,2020-04-14,Province
Pakistan,Balochistan,2020-03-24,2020-04-21,Province
Pakistan,Gilgit-Baltistan,2020-03-22,2020-04-21,Administrative
Panama,,2020-03-25,2020-04-07,National
Papua New Guinea,,2020-03-24,2020-04-07,National
Paraguay,,2020-03-20,2020-04-12,National
Peru,,2020-03-16,2020-04-26,National
Philippines,Cebu,2020-03-27,2020-04-30,Province
Philippines,Davao Region,2020-03-19,2020-04-30,Region
Philippines,Luzon,2020-03-15,2020-04-30,Island group
Philippines,Soccsksargen,2020-03-23,2020-04-30,Region
Poland,,2020-03-13,2020-04-11,National
Portugal,,2020-03-19,2020-04-02,National
Qatar,Doha Industrial Area,2020-03-11,,Industrial park
Romania,,2020-03-25,2020-05-12,National
Russia,Moscow,2020-03-30,2020-04-14,Metropolitan area
Rwanda,,2020-03-21,2020-04-19,National
Samoa,,2020-03-26,2020-04-08,National
San Marino,,2020-03-14,,National
Saudi Arabia,Jeddah,2020-03-29,,City
Saudi Arabia,Mecca,2020-03-26,,City
Saudi Arabia,Medina,2020-03-26,,City
Saudi Arabia,Qatif,2020-03-09,,Area
Saudi Arabia,Riyadh,2020-03-26,,City
Serbia,,2020-03-15,,National
Singapore,,2020-04-07,2020-05-04,National
South Africa,,2020-03-26,2020-04-15,National
Spain,,2020-03-14,2020-04-25,National
Thailand,,2020-03-25,2020-04-30,National
Tunisia,,2020-03-22,2020-04-19,National
Ukraine,,2020-03-17,2020-04-24,National
United Arab Emirates,,2020-03-26,2020-04-17,National
United Kingdom,,2020-03-24,2020-04-13,National
United States,California,2020-03-19,,State
United States,"Clark County, NV",2020-03-20,,County
United States,Connecticut,2020-03-23,2020-04-22,State
United States,Illinois,2020-03-21,2020-04-07,State
United States,"Kansas City, KS",2020-03-24,2020-04-19,City
United States,Massachusetts,2020-03-24,2020-04-07,State
United States,Michigan,2020-03-24,2020-04-13,State
United States,New York,2020-03-20,2020-04-29,State
United States,Oregon,2020-03-24,,State
United States,Wisconsin,2020-03-24,,State
Venezuela,,2020-03-17,,National
Zimbabwe,,2020-03-30,2020-04-19,National

Still not easy to use. What to do when the quarantine date is not national? A barycenter? Nothing? (Very interesting: I just realized that you can't compute the mean of dates, because the sum of dates has no meaning, but you can compute a barycenter: take the distance from any origin, compute the mean of distances and add the origin again).