Suspicious frequency distribution of the init_country_level variable

ingonader commented 4 years ago

The init_country_level seems has five levels:

init_country_level	n
Municipal	744
National	8347
No, it is at the national level	118
Yes, it is at another governmental level (e.g. county)	2
Yes, it is at the province/state level	4680

To me, they seem to stem from two different "sets" of levels: "Municipal" vs. "National" (mun/nat), and another distinct set where the related question in the RA questionnaire was a "yes/no" question. If this is the case, something seems odd in the data distribution here: For the mun/nat-set, the vast majority of policies seem to be on national level (which makes sense, as local outbreaks with first policies are followed by a comprehensive number of national policies in a lot of countries). On the other hand, in the yes/no set of responses, the majority of the policies seem to be on province/state level (and not on national level). Hence, depending on the answering format, the pattern seems to be reversed.

This pattern can also be found within some type categories of policies, here are some examples:

type	init_country_level	n	perc
Closure and Regulation of Schools	Municipal	97	6.7
Closure and Regulation of Schools	National	722	49.7
Closure and Regulation of Schools	No, it is at the national level	18	1.2
Closure and Regulation of Schools	Yes, it is at the province/state level	617	42.4

type	init_country_level	n	perc
Health Monitoring	Municipal	19	5.4
Health Monitoring	National	232	66.5
Health Monitoring	No, it is at the national level	10	2.9
Health Monitoring	Yes, it is at the province/state level	88	25.2

type	init_country_level	n	perc
Health Resources	Municipal	82	3.1
Health Resources	National	1655	61.7
Health Resources	No, it is at the national level	6	0.2
Health Resources	Yes, it is at the province/state level	940	35

type	init_country_level	n	perc
Health Testing	Municipal	14	4.1
Health Testing	National	205	60.5
Health Testing	No, it is at the national level	2	0.6
Health Testing	Yes, it is at the province/state level	118	34.8

type	init_country_level	n	perc
Other Policy Not Listed Above	Municipal	41	4.3
Other Policy Not Listed Above	National	678	71.9
Other Policy Not Listed Above	No, it is at the national level	3	0.3
Other Policy Not Listed Above	Yes, it is at the province/state level	221	23.4

type	init_country_level	n	perc
Public Awareness Measures	Municipal	27	4.1
Public Awareness Measures	National	418	63.1
Public Awareness Measures	No, it is at the national level	3	0.5
Public Awareness Measures	Yes, it is at the province/state level	214	32.3

type	init_country_level	n	perc
Quarantine	Municipal	73	6.3
Quarantine	National	746	64.4
Quarantine	No, it is at the national level	22	1.9
Quarantine	Yes, it is at the province/state level	318	27.4

type	init_country_level	n	perc
Restriction and Regulation of Businesses	Municipal	106	5.8
Restriction and Regulation of Businesses	National	853	46.3
Restriction and Regulation of Businesses	No, it is at the national level	18	1
Restriction and Regulation of Businesses	Yes, it is at the province/state level	866	47

type	init_country_level	n	perc
Restriction and Regulation of Government Services	Municipal	11	2.5
Restriction and Regulation of Government Services	National	201	45.5
Restriction and Regulation of Government Services	No, it is at the national level	7	1.6
Restriction and Regulation of Government Services	Yes, it is at the province/state level	223	50.5

type	init_country_level	n	perc
Restrictions of Mass Gatherings	Municipal	44	6.5
Restrictions of Mass Gatherings	National	380	56.4
Restrictions of Mass Gatherings	No, it is at the national level	2	0.3
Restrictions of Mass Gatherings	Yes, it is at the province/state level	248	36.8

type	init_country_level	n	perc
Social Distancing	Municipal	48	8.2
Social Distancing	National	312	53.6
Social Distancing	No, it is at the national level	3	0.5
Social Distancing	Yes, it is at the province/state level	219	37.6

To me, this seems strange. Please decide if this is worth investigating.

ingonader commented 4 years ago

Another indication that something is amiss here are the target_region, target_province, and target_city variables. The table below shows the percentage of non-missing data across the whole dataset. The "National" level has a very low percentage of non-missing data in the target_province variable (which I think is expected), but the "no, national level" level has a relatively high percentages of non-missing data in that category:

init_country_level	target_region	target_province	target_city
Municipal	0.1%	1.9%	21.4%
National	1.6%	2.9%	1.9%
No, it is at the national level	1.7%	14.4%	2.5%
Yes, it is at another governmental level (e.g. county)	%0	0%	%0
Yes, it is at the province/state level	0.5%	21.9%	1.6%

(The municipal vs. province/state levels make a whole lot more sense to me now, looking at this table).

timothymodel commented 4 years ago

Should be corrected in the most recent data releases

CoronaNetDataScience / corona_tscs

Suspicious frequency distribution of the init_country_level variable #17