Closed VasLem closed 3 years ago
Risk: Higher water prices
Explained variation percentage per principal component: [61.611560540676514, 14.392035205828138, 6.978655586311352, 3.6765889379404935, 2.5773972654688033, 2.1456148918369906, 1.2565741167344016, 1.1335668356170254, 1.0067086430354297, 0.8534679314685039] Total percentage of the explained data by 10 components is: 95.63 Percentage of the information that is lost for using 10 components is: 4.37 Picked variable number: 10 Features select | Specs | Score | |
---|---|---|---|
439 | perc_poly__Feat[Share of seats in parliament (% held by women)] * Feat[Infants lacking immunization, DTP (% of one-year-olds)] | 7.76825 | |
1455 | perc_poly__Feat[Unemployment, total (% of labour force)] * Feat[Employment in agriculture (% of total employment)] | 6.13585 | |
2639 | perc_poly__Feat[Gross enrolment ratio, upper secondary, both sexes (%)] * Feat[Prevalence of HIV, total (% of population ages 15-49)] | 6.13184 | |
295 | perc_poly__Feat[Population with at least some secondary education, female (% ages 25 and older)] * Feat[Unemployment, total (% of labour force)] | 6.11164 | |
1456 | perc_poly__Feat[Unemployment, total (% of labour force)] * Feat[Employment in services (% of total employment)] | 6.03654 | |
1740 | perc_poly__Feat[Employment in agriculture (% of total employment)] * Feat[Municipal water withdrawal as % of total withdrawal] | 6.02999 | |
2005 | perc_poly__Feat[Overall loss in HDI due to inequality (%)] * Feat[Municipal water withdrawal as % of total withdrawal] | 5.83638 | |
218 | perc_poly__Feat[Population with at least some secondary education (% ages 25 and older)] * Feat[Unemployment, total (% of labour force)] | 5.64476 | |
520 | perc_poly__Feat[Vulnerable employment (% of total employment)] * Feat[Unemployment, total (% of labour force)] | 5.5885 | |
1308 | perc_poly__Feat[Unemployment, youth (% ages 15?24)] * Feat[Percentage of students in primary education who are female (%)] | 5.52659 |
Risk: Inadequate or aging infrastructure
Explained variation percentage per principal component: [74.87709658177496, 14.602277912843489, 3.190126961316718, 1.4447942853728544, 1.08658233938848, 0.7879161541089548, 0.6610118837477541, 0.561494345043655, 0.4302827039175066, 0.40604084295520415] Total percentage of the explained data by 10 components is: 98.05 Percentage of the information that is lost for using 10 components is: 1.95 Picked variable number: 15 Features select | Specs | Score | |
---|---|---|---|
449 | perc_poly__Feat[Share of seats in parliament (% held by women)] * Feat[Employment to population ratio (% ages 15 and older)] | 6.04373 | |
2911 | perc_poly__Feat[Percentage of female students enrolled in primary education who are over-age, female (%)] * Feat[Percentage of students in secondary education who are female (%)] | 5.72702 | |
1369 | perc_poly__Feat[Private capital flows (% of GDP)] * Feat[Percentage of students in pre-primary education who are female (%)] | 5.54383 | |
3031 | perc_poly__Feat[Percentage of students enrolled in primary education who are over-age, both sexes (%)] * Feat[Percentage of students in secondary education who are female (%)] | 5.1383 | |
1847 | perc_poly__Feat[Working poor at PPP$3.20 a day (% of total employment)] * Feat[Industrial water withdrawal as % of total water withdrawal] | 4.97675 | |
1556 | perc_poly__Feat[Youth not in school or employment (% ages 15-24)] * Feat[Population, female (% of total)] | 4.86128 | |
1557 | perc_poly__Feat[Youth not in school or employment (% ages 15-24)] * Feat[Population, male (% of total)] | 4.86128 | |
1100 | perc_poly__Feat[Gross fixed capital formation (% of GDP)] * Feat[Gross enrolment ratio, upper secondary, both sexes (%)] | 4.73771 | |
2937 | perc_poly__Feat[Percentage of male students enrolled in primary education who are over-age, male (%)] * Feat[Percentage of students in secondary education who are female (%)] | 4.57644 | |
443 | perc_poly__Feat[Share of seats in parliament (% held by women)] * Feat[Unemployment, youth (% ages 15?24)] | 4.52001 | |
465 | perc_poly__Feat[Share of seats in parliament (% held by women)] * Feat[Gross enrolment ratio, primary, female (%)] | 4.43906 | |
1912 | perc_poly__Feat[Gross capital formation (% of GDP)] * Feat[Gross enrolment ratio, pre-primary, male (%)] | 4.31925 | |
446 | perc_poly__Feat[Share of seats in parliament (% held by women)] * Feat[Unemployment, total (% of labour force)] | 4.29959 | |
1910 | perc_poly__Feat[Gross capital formation (% of GDP)] * Feat[Gross enrolment ratio, pre-primary, both sexes (%)] | 4.24168 | |
1911 | perc_poly__Feat[Gross capital formation (% of GDP)] * Feat[Gross enrolment ratio, pre-primary, female (%)] | 4.15378 |
Risk: Increased water stress or scarcity
Explained variation percentage per principal component: [56.49588784008235, 29.775887904318964, 3.192067775029895, 2.215176452112342, 1.430524570369619, 1.0529946852886123, 1.0070090560125087, 0.6617911411911048, 0.568967410519917, 0.5257576244921963] Total percentage of the explained data by 10 components is: 96.93 Percentage of the information that is lost for using 10 components is: 3.07 Picked variable number: 27 Features select | Specs | Score | |
---|---|---|---|
575 | perc_poly__Feat[Vulnerable employment (% of total employment)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 21.6239 | |
1791 | perc_poly__Feat[Employment in services (% of total employment)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 19.7533 | |
350 | perc_poly__Feat[Population with at least some secondary education, female (% ages 25 and older)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 18.2396 | |
3067 | perc_poly__Feat[Percentage of students in pre-primary education who are female (%)] * Feat[Agricultural water withdrawal as % of total water withdrawal] | 18.0407 | |
1196 | perc_poly__Feat[Inequality in education (%)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 17.9179 | |
1736 | perc_poly__Feat[Employment in agriculture (% of total employment)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 17.8274 | |
273 | perc_poly__Feat[Population with at least some secondary education (% ages 25 and older)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 16.8634 | |
1845 | perc_poly__Feat[Working poor at PPP$3.20 a day (% of total employment)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 16.1346 | |
536 | perc_poly__Feat[Vulnerable employment (% of total employment)] * Feat[Gross enrolment ratio, pre-primary, female (%)] | 16.0199 | |
3068 | perc_poly__Feat[Percentage of students in pre-primary education who are female (%)] * Feat[Industrial water withdrawal as % of total water withdrawal] | 15.5643 | |
1600 | perc_poly__Feat[Labour force participation rate (% ages 15 and older)] * Feat[Percentage of enrolment in secondary education in private institutions (%)] | 15.2258 | |
535 | perc_poly__Feat[Vulnerable employment (% of total employment)] * Feat[Gross enrolment ratio, pre-primary, both sexes (%)] | 14.9113 | |
537 | perc_poly__Feat[Vulnerable employment (% of total employment)] * Feat[Gross enrolment ratio, pre-primary, male (%)] | 13.8573 | |
184 | pop_poly__Feat[Total population (millions)] * Feat[population_1k_density] | 13.8306 | |
578 | perc_poly__Feat[Vulnerable employment (% of total employment)] * Feat[MDG 7.5. Freshwater withdrawal as % of total renewable water resources] | 13.6625 | |
1390 | perc_poly__Feat[Exports and imports (% of GDP)]^2 | 13.5843 | |
1599 | perc_poly__Feat[Labour force participation rate (% ages 15 and older)] * Feat[Percentage of enrolment in primary education in private institutions (%)] | 13.5199 | |
775 | perc_poly__Feat[Labour force participation rate (% ages 15 and older), male] * Feat[Percentage of students in pre-primary education who are female (%)] | 13.4784 | |
2303 | perc_poly__Feat[Gross enrolment ratio, pre-primary, female (%)] * Feat[Labor force, female (% of total labor force)] | 13.2982 | |
156 | pop_poly__Feat[population] * Feat[Urban population (%)] | 13.2529 | |
2259 | perc_poly__Feat[Gross enrolment ratio, pre-primary, both sexes (%)] * Feat[Labor force, female (% of total labor force)] | 13.2224 | |
426 | perc_poly__Feat[Population with at least some secondary education, male (% ages 25 and older)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 13.2134 | |
1197 | perc_poly__Feat[Inequality in education (%)] * Feat[Agricultural water withdrawal as % of total water withdrawal] | 13.1529 | |
2346 | perc_poly__Feat[Gross enrolment ratio, pre-primary, male (%)] * Feat[Labor force, female (% of total labor force)] | 13.1356 | |
1512 | perc_poly__Feat[Youth not in school or employment (% ages 15-24)] * Feat[Labour force participation rate (% ages 15 and older)] | 13.1342 | |
1697 | perc_poly__Feat[Employment in agriculture (% of total employment)] * Feat[Gross enrolment ratio, pre-primary, female (%)] | 12.978 | |
648 | perc_poly__Feat[Urban population (%)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 12.9201 |
Risk: Declining water quality
Explained variation percentage per principal component: [78.27390879136566, 6.166052091257242, 3.282279668960054, 3.1163840169810704, 2.380639356658543, 1.097313877635352, 0.9093861859235939, 0.7804899479485267, 0.5492017083013799, 0.48874242531712797] Total percentage of the explained data by 10 components is: 97.04 Percentage of the information that is lost for using 10 components is: 2.96 Picked variable number: 19 Features select | Specs | Score | |
---|---|---|---|
1565 | perc_poly__Feat[Youth not in school or employment (% ages 15-24)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 13.7088 | |
1568 | perc_poly__Feat[Youth not in school or employment (% ages 15-24)] * Feat[MDG 7.5. Freshwater withdrawal as % of total renewable water resources] | 13.5287 | |
746 | perc_poly__Feat[Labour force participation rate (% ages 15 and older), male] * Feat[Inequality in income (%)] | 13.2073 | |
777 | perc_poly__Feat[Labour force participation rate (% ages 15 and older), male] * Feat[Percentage of students in secondary education who are female (%)] | 12.116 | |
2004 | perc_poly__Feat[Overall loss in HDI due to inequality (%)] * Feat[MDG 7.5. Freshwater withdrawal as % of total renewable water resources] | 12.0001 | |
2038 | perc_poly__Feat[Inequality in income (%)] * Feat[Percentage of students in secondary general education who are female (%)] | 11.7702 | |
2001 | perc_poly__Feat[Overall loss in HDI due to inequality (%)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 11.02 | |
1308 | perc_poly__Feat[Unemployment, youth (% ages 15?24)] * Feat[Percentage of students in primary education who are female (%)] | 10.9603 | |
1263 | perc_poly__Feat[Inequality in life expectancy (%)] * Feat[MDG 7.5. Freshwater withdrawal as % of total renewable water resources] | 10.5991 | |
530 | perc_poly__Feat[Vulnerable employment (% of total employment)] * Feat[Inequality in income (%)] | 10.4985 | |
137 | scaled__SDG 6.4.2. Water Stress | 10.3154 | |
3058 | perc_poly__Feat[Percentage of students in pre-primary education who are female (%)] * Feat[Population, male (% of total)] | 10.1918 | |
3057 | perc_poly__Feat[Percentage of students in pre-primary education who are female (%)] * Feat[Population, female (% of total)] | 10.1918 | |
720 | perc_poly__Feat[Labour force participation rate (% ages 15 and older), female] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 10.0034 | |
3140 | perc_poly__Feat[Population ages 0-14 (% of total)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 9.99864 | |
2780 | perc_poly__Feat[Labor force, female (% of total labor force)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 9.90631 | |
1898 | perc_poly__Feat[Share of employment in nonagriculture, female (% of total employment in nonagriculture)] * Feat[Agricultural water withdrawal as % of total renewable water resources] | 9.67468 | |
3143 | perc_poly__Feat[Population ages 0-14 (% of total)] * Feat[MDG 7.5. Freshwater withdrawal as % of total renewable water resources] | 9.66008 | |
1068 | perc_poly__Feat[Infants lacking immunization, DTP (% of one-year-olds)] * Feat[MDG 7.5. Freshwater withdrawal as % of total renewable water resources] | 9.36312 |
Risk: Increased water demand
Explained variation percentage per principal component: [74.35979959531807, 16.23639131615468, 3.052385317234916, 1.2490249413135015, 0.8597006844862726, 0.8058804595900894, 0.6070225334257546, 0.5299944687415014, 0.4672292781172945, 0.3178953000586926] Total percentage of the explained data by 10 components is: 98.49 Percentage of the information that is lost for using 10 components is: 1.51 Picked variable number: 10 Features select | Specs | Score | |
---|---|---|---|
487 | perc_poly__Feat[Share of seats in parliament (% held by women)] * Feat[Percentage of students in secondary education who are female (%)] | 12.9493 | |
447 | perc_poly__Feat[Share of seats in parliament (% held by women)] * Feat[Youth not in school or employment (% ages 15-24)] | 8.64105 | |
2874 | perc_poly__Feat[Percentage of enrolment in primary education in private institutions (%)] * Feat[Municipal water withdrawal as % of total withdrawal] | 7.82076 | |
1482 | perc_poly__Feat[Unemployment, total (% of labour force)] * Feat[Percentage of enrolment in primary education in private institutions (%)] | 7.6997 | |
594 | perc_poly__Feat[Urban population (%)] * Feat[Youth not in school or employment (% ages 15-24)] | 7.58774 | |
0 | scaled__population | 7.47841 | |
1483 | perc_poly__Feat[Unemployment, total (% of labour force)] * Feat[Percentage of enrolment in secondary education in private institutions (%)] | 7.30762 | |
1299 | perc_poly__Feat[Unemployment, youth (% ages 15?24)] * Feat[Percentage of enrolment in primary education in private institutions (%)] | 7.00005 | |
2896 | perc_poly__Feat[Percentage of enrolment in secondary education in private institutions (%)] * Feat[Unemployment, male (% of male labor force) (modeled ILO estimate)] | 6.94724 | |
486 | perc_poly__Feat[Share of seats in parliament (% held by women)] * Feat[Percentage of students in primary education who are female (%)] | 6.81146 |
Risk: Regulatory
Explained variation percentage per principal component: [60.84877916778261, 12.932677773264626, 6.030719106399093, 4.0221021556904875, 3.0887947680546803, 2.4727109196783488, 1.7777044845148402, 1.6435185695388081, 1.1437258163666977, 1.0042779571148468] Total percentage of the explained data by 10 components is: 94.97 Percentage of the information that is lost for using 10 components is: 5.03 Picked variable number: 10 Features select | Specs | Score | |
---|---|---|---|
1397 | perc_poly__Feat[Exports and imports (% of GDP)] * Feat[Working poor at PPP$3.20 a day (% of total employment)] | 12.0832 | |
277 | perc_poly__Feat[Population with at least some secondary education (% ages 25 and older)] * Feat[Municipal water withdrawal as % of total withdrawal] | 11.6903 | |
354 | perc_poly__Feat[Population with at least some secondary education, female (% ages 25 and older)] * Feat[Municipal water withdrawal as % of total withdrawal] | 11.6719 | |
430 | perc_poly__Feat[Population with at least some secondary education, male (% ages 25 and older)] * Feat[Municipal water withdrawal as % of total withdrawal] | 11.4424 | |
1740 | perc_poly__Feat[Employment in agriculture (% of total employment)] * Feat[Municipal water withdrawal as % of total withdrawal] | 10.9302 | |
1204 | perc_poly__Feat[Inequality in life expectancy (%)] * Feat[Exports and imports (% of GDP)] | 10.5559 | |
579 | perc_poly__Feat[Vulnerable employment (% of total employment)] * Feat[Municipal water withdrawal as % of total withdrawal] | 10.3693 | |
1400 | perc_poly__Feat[Exports and imports (% of GDP)] * Feat[Overall loss in HDI due to inequality (%)] | 9.13338 | |
652 | perc_poly__Feat[Urban population (%)] * Feat[Municipal water withdrawal as % of total withdrawal] | 9.1217 | |
1140 | perc_poly__Feat[Inequality in education (%)] * Feat[Exports and imports (% of GDP)] | 8.62565 |
Risk: Energy supply issues
Explained variation percentage per principal component: [56.591930731505336, 17.827250608775376, 5.099604567700036, 4.331467067036899, 3.0868188081393892, 2.699519986543184, 1.882391125355395, 1.8205347861215964, 1.2734765640436936, 0.8998715916627013] Total percentage of the explained data by 10 components is: 95.51 Percentage of the information that is lost for using 10 components is: 4.49 Picked variable number: 10 Features select | Specs | Score | |
---|---|---|---|
3183 | perc_poly__Feat[Population, female (% of total)] * Feat[Unemployment, male (% of male labor force) (modeled ILO estimate)] | 19.9001 | |
3196 | perc_poly__Feat[Population, male (% of total)] * Feat[Unemployment, male (% of male labor force) (modeled ILO estimate)] | 19.9001 | |
2600 | perc_poly__Feat[Gross enrolment ratio, secondary, male (%)] * Feat[Population growth (annual %)] | 18.3182 | |
3184 | perc_poly__Feat[Population, female (% of total)] * Feat[Unemployment, total (% of total labor force) (modeled ILO estimate)] | 17.5992 | |
3197 | perc_poly__Feat[Population, male (% of total)] * Feat[Unemployment, total (% of total labor force) (modeled ILO estimate)] | 17.5992 | |
2525 | perc_poly__Feat[Gross enrolment ratio, secondary, both sexes (%)] * Feat[Population growth (annual %)] | 16.5388 | |
2636 | perc_poly__Feat[Gross enrolment ratio, upper secondary, both sexes (%)] * Feat[Population growth (annual %)] | 15.1723 | |
2563 | perc_poly__Feat[Gross enrolment ratio, secondary, female (%)] * Feat[Population growth (annual %)] | 14.3133 | |
889 | perc_poly__Feat[Foreign direct investment, net inflows (% of GDP)] * Feat[Gross enrolment ratio, lower secondary, male (%)] | 13.6433 | |
114 | scaled__Population, male (% of total) | 13.1978 |
Crazy outcomes, particularly related to the "Share of seats in parliament (% held by women)" . It seems that some features have been picked to relate to different civilizations/standards etc. We can get some pretty nice insights by looking at this data
@ekaan also needs to add documentation, then we can merge it
Crazy outcomes, particularly related to the "Share of seats in parliament (% held by women)" . It seems that some features have been picked to relate to different civilizations/standards etc. We can get some pretty nice insights by looking at this data
@adriana-madi This needs to go in the report somehow. Maybe we could make a tag cloud or some other representation?
So @antosalerno had asked to upload a csv of the augmented dataset, which is "huge" (60Mb), so instead I uploaded a notebook called ClassificationOnAugmentedFeatures.ipynb instead, which one can use as a template, to run the code inside and get the augmented features, it is a deterministic process, so I assume that there is no reason to save it as a csv and pollute Github
Added Feature Generation and Selection part. @antosalerno @adriana-madi @bajo1207 @OlympiaG the feature selection part can be removed, if you pick a model that performs also feature selection, it is relatively easy to remove it. @ekaan can you add documentation to the python file when you have time later today?