epidemics / covid

epidemicforcasting.org visualization repository
http://epidemicforecasting.org
GNU Affero General Public License v3.0
20 stars 12 forks source link

Gather data from different sources for NPI model #518

Open hazarane opened 4 years ago

hazarane commented 4 years ago

Description

The data that we are using now for NPI model were created by forecasters before and now are outdated.

Acceptance Criteria

Use data from other sources to compile the dataset for NPI model. Available datasets: https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-tracker https://www.nature.com/articles/s41562-020-0909-7 https://github.com/amel-github/covid19-interventionmeasures https://masks4all.co/what-countries-require-masks-in-public/

Investigate the datasets and merge them as most fit for the model.

JanataPavel commented 4 years ago

From the list of datasets https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-tracker - OxCGRT - useful for most of the features https://www.nature.com/articles/s41562-020-0909-7 - CoronaNet - the data is more ambiguous and less suited for our usecase, but some features could be extracted from it to fill the gaps of OxCGRT https://github.com/amel-github/covid19-interventionmeasures - The data is no longer updated so it's no use to us https://masks4all.co/what-countries-require-masks-in-public/ - sporadically updated, the data only contains the current status

OxCGRT

The definition of their features can be found in their codebook. Some of the features have also flags marking whether the policy is nationwide or only in some parts of the country. As we're interested only in the nationwide interventions we require the flags to be turned on

Some of the features used in some of our experiments already originated from the OxCGRT dataset

Most of the OxCGRT features are on an ordinal. 0 corresponding to no restrictions and then ranging from some restriction to tighter restriction. Usually, some of the lowest levels are recommendations for the public, which we ignore, as we are only concerned with policies which are enforced in some way.

Mapping:

Our feature OxCGRT feature
Name Description
Name Threshold, Description
Symptomatic Testing From OxCGRT H2_Testing policy >= 2
2 - testing of anyone showing Covid-19 symptoms
3 - open public testing (eg "drive through" testing available to asymptomatic people)
Blank - no data
Gatherings <1000 A country has set a size limit on gatherings. The limit is at most 1000
people (often less), and gatherings above the maximum size are disal-
lowed. For example, a ban on gatherings of 500 people or more would
be classified as “gatherings limited to 1000 or less”, but a ban on gath-
erings of 2000 people or more would not.
C4_Restrictions on gatherings >= 2
2 - restrictions on gatherings between 101-1000 people
3 - restrictions on gatherings between 11-100 people
4 - restrictions on gatherings of 10 people or less
Gatherings <100 A country has set a size limit on gatherings. The limit is at most 100
people (often less).
C4_Restrictions on gatherings >= 3
3 - restrictions on gatherings between 11-100 people
4 - restrictions on gatherings of 10 people or less
Gatherings <10 A country has set a size limit on gatherings. The limit is at most 10
people (often less).
C4_Restrictions on gatherings >= 4
4 - restrictions on gatherings of 10 people or less
School Closure A country has closed most or all schools. C1_School closing >= 3
3 - require closing all levels
Stay Home Order An order for the general public to stay at home has been issued. This is
mandatory, not just a recommendation. Exemptions are usually granted
for certain purposes (such as shopping, exercise, or going to work), or,
more rarely, for certain times of the day. In practice, a stay-at-home
order was often accompanied by other NPIs such as businesses closures.
However, a stay-at-home order does not in principle entail these other
NPIs, but only the (additional) order to generally stay at home except
for exemptions.
C6_Stay at home requirements >= 2
2 - require not leaving house with exceptions for daily exercise, grocery shopping, and 'essential' trips
3 - require not leaving house with minimal exceptions (eg allowed to
leave once a week, or only one person can leave at a time, etc)
Travel Screen/Quarantine From OxCGRT C8_International travel controls >= 1
1 - screening arrivals
2 - quarantine arrivals from some or all regions
3 - ban arrivals from some regions
4 - ban on all regions or total border closure
Travel Bans From OxCGRT C8_International travel controls >= 3
3 - ban arrivals from some regions
4 - ban on all regions or total border closure
Public Transport Limited From OxCGRT C5_Close public transport >= 1
1 - recommend closing (or significantly reduce volume/route/means of transport available)
2 - require closing (or prohibit most citizens from using it
Internal Movement Limited From OxCGRT C7_Restrictions on internal movement >= 1
1 - recommend not to travel between regions/cities
2 - internal movement restrictions in place
Public Information Campaigns From OxCGRT H1_Public information campaigns >= 1
1 - public officials urging caution about Covid-19
2- coordinated public information campaign (eg across traditional and social media)

non-mapped features

CoronaNet

The CoronaNet dataset is a list of entries each corresponding to some announced policy which went into an effect at some date. The policies are grouped into categories (type) and subcategories (type_sub_cat) which are described in their codebook. However, the names of subcategories in codebook don't perfectly fit the names in the data and I could not find any coherent description of the subcategories.

From all the entries we have to filter the nationwide and mandatory policies (columns init_country_level and compliance)

The main goal with this dataset is to fill in the features not contained in the OxCGRT data (i.e. masks, businesses, and universities)

### Universities Although there are defined sub_categories for universities, there are no entries about closed universities in the data

### Masks All the mask-wearing entries should be contained in the Social Distancing category. Usually, it is in one of these subcategories Unspecified Mask Wearing Policy, Wearing masks, Other Mask Wearing Policy, but not always and it can be also in All public spaces / everywhere, Inside public or commercial building (e.g. supermarkets) as part of a wider policy. So the criteria for mask-wearing entry is to satisfy one of these conditions * Has type_sub_cat one of the [Unspecified Mask Wearing Policy, Wearing masks, Other Mask Wearing Policy] * Has type == Social Distancing and contains "mask" in the description of the entery

### Busnesses All entries related to the closing of businesses have type=="Restriction and Regulation of Businesses" * Some Businesses Suspended - has any subcategory except ["Construction","Telecommunications", "Information service activities", "Publishing activities", "Warehousing and support activities for transportation", "Mining and quarrying"] (we want only customer-facing) * Some Businesses Suspended - has one of the subcategories ["All or unspecified non-essential businesses", "All or unspecified essential businesses", "Non-Essential Commercial Businesses", "Other Essential Businesses"]

Although the CoronaNet contains lot of data about many countries, the format of the data makes is basically impossible to automatically determine which countermeasures are on at a given time. From my exploration of the data, deducing any meaningful information from it would require reading the description of individual entries, because it often happens, that same an entry with some category and subcategory can have widely different meanings. Plus, the entries are not exactly consistent.