Include necessary data - Githubissues

jdhoffa commented 3 years ago

There are a handful of datasets that seem to be important for the tool. From what I can tell so far, they are at least: scenario_data and countries

The scenario_data source of truth comes from: https://github.com/2DegreesInvesting/CapitalMarketsPlatform/blob/master/data/scenario_data.rds

The countries seem to be manually defined as vectors in the original code. It might make sense to extract these as individual datasets.

Below, I have a sample script that generates the important data that we need from scenario_data.rds and the manual country definitions.

In #2 for the questions 4.1 and 4.2, you will need the unique values of countries and regions as choices.

I don't have a good idea of how/ where to store this information yet, but useful to see for now.

What I include below isn't exactly a reprex, but hopefully it's enough information to get started working.

library(tidyverse)
#Source: https://github.com/2DegreesInvesting/CapitalMarketsPlatform/blob/master/data/scenario_data.rds

scenario_data_clean <- scenario_data %>% 
  tidyr::pivot_longer(
    `2000`:last_col(),
    names_to = "year",
    names_ptypes = integer(),
    values_to = "value"
    ) %>% 
  janitor::clean_names() %>% 
  dplyr::filter(!is.na(value))

scenario_providers <- unique(select(scenario_data_clean, model, scenario))

# countries and regions

regions <- scenario_data_clean %>% 
  mutate(
    region_simplified = dplyr::case_when(
      region == "R5ASIA" ~ "Asia",
      region == "R5LAM" ~ "Latin America",
      region == "R5MAF" ~ "Middle East and Africa",
      region == "R5OECD90+EU" ~ "OECD90 and EU countries",
      region == "R5REF" ~ "Reforming Ecomonies of the Former Soviet Union",
      region == "R5ROWO" ~ "Rest of the World",
      region == "World" ~ "World"
    ) 
  ) %>% 
  select(.data$region_simplified) %>% 
  distinct() %>% 
  arrange()

cntrs_R5OECD90_EU <- c(
  "Albania",
  "Austria",
  "Belgium",
  "Bosnia and Herzegovina",
  "Bulgaria",
  "Croatia",
  "Cyprus",
  "Czech Republic",
  "Denmark",
  "Estonia",
  "Finland",
  "France",
  "Germany",
  "Greece",
  "Hungary",
  "Iceland",
  "Ireland",
  "Italy",
  "Latvia",
  "Lithuania",
  "Luxembourg",
  "Macedonia",
  "Malta",
  "Montenegro",
  "Netherlands",
  "Norway",
  "Poland",
  "Portugal",
  "Spain",
  "Sweden",
  "Switzerland",
  "Turkey",
  "United Kingdom",
  "Canada",
  "United States of America",
  "Australia",
  "Fiji",
  "French Polynesia",
  "Guam",
  "Japan",
  "New Caledonia",
  "New Zealand",
  "Romania",
  "Samoa",
  "Serbia",
  "Slovakia",
  "Slovenia",
  "Solomon Islands",
  "Vanuatu"
)

cntrs_R5REF <- c(
  "Armenia",
  "Azerbaijan",
  "Belarus",
  "Georgia",
  "Kazakhstan",
  "Kyrgyzstan",
  "Republic of Moldova",
  "Russian Federation",
  "Tajikistan",
  "Turkmenistan",
  "Ukraine",
  "Uzbekistan"
)

cntrs_R5ASIA <- c(
  "China",
  "Hong Kong",
  "Macao",
  "Mongolia",
  "Taiwan",
  "Afghanistan",
  "Bangladesh",
  "Bhutan",
  "India",
  "Maldives",
  "Nepal",
  "Pakistan",
  "Sri Lanka",
  "Brunei Darussalam",
  "Cambodia",
  "Democratic People's Republic of Korea",
  "East Timor",
  "Indonesia",
  "Lao People's Democratic Republic",
  "Malaysia",
  "Myanmar",
  "Papua New Guinea",
  "Philippines",
  "Republic of Korea",
  "Singapore",
  "Thailand",
  "Viet Nam"
)
cntrs_R5MAF <- c(
  "Bahrain",
  "Iran",
  "Iraq",
  "Israel",
  "Jordan",
  "Kuwait",
  "Lebanon",
  "Oman",
  "Qatar",
  "Saudi Arabia",
  "Syrian Arab Republic",
  "United Arab Emirates",
  "Yemen",
  "Algeria",
  "Angola",
  "Benin",
  "Botswana",
  "Burkina Faso",
  "Burundi",
  "Cote d'Ivoire",
  "Cameroon",
  "Cape Verde",
  "Central African Republic",
  "Chad",
  "Comoros",
  "Congo",
  "Democratic Republic of the Congo",
  "Djibouti",
  "Egypt",
  "Equatorial Guinea",
  "Eritrea",
  "Ethiopia",
  "Gabon",
  "Gambia",
  "Ghana",
  "Guinea",
  "Guinea-Bissau",
  "Kenya",
  "Lesotho",
  "Liberia",
  "Libyan Arab Jamahiriya",
  "Madagascar",
  "Malawi",
  "Mali",
  "Mauritania",
  "Mauritius",
  "Morocco",
  "Mozambique",
  "Namibia",
  "Niger",
  "Nigeria",
  "Reunion",
  "Rwanda",
  "Senegal",
  "Sierra Leone",
  "Somalia",
  "South Africa",
  "Sudan",
  "Swaziland",
  "Togo",
  "Tunisia",
  "Uganda",
  "United Republic of Tanzania",
  "Western Sahara",
  "Zambia",
  "Zimbabwe"
)
cntrs_R5LAM <- c(
  "Argentina",
  "Bahamas",
  "Barbados",
  "Belize",
  "Bolivia",
  "Brazil",
  "Chile",
  "Colombia",
  "Costa Rica",
  "Cuba",
  "Dominican Republic",
  "Ecuador",
  "El Salvador",
  "Guadeloupe",
  "Guatemala",
  "Guyana",
  "Haiti",
  "Honduras",
  "Jamaica",
  "Martinique",
  "Mexico",
  "Netherlands Antilles",
  "Nicaragua",
  "Panama",
  "Paraguay",
  "Peru",
  "Puerto Rico",
  "Suriname",
  "Trinidad and Tobago",
  "Uruguay",
  "Venezuela"
)

countries <- c(
  cntrs_R5OECD90_EU,
  cntrs_R5REF, 
  cntrs_R5ASIA, 
  cntrs_R5MAF, 
  cntrs_R5LAM
  ) %>% 
  unique() %>% 
  sort()

# important datasets
head(scenario_data_clean)
head(scenario_providers)
head(regions)
head(countries)

maurolepore commented 3 years ago

For the record, as I experimented a bit I came up with this slightly different way to clean the scenario_data:

  data %>%
    tidyr::pivot_longer(
      where(is.double),
      names_to = "year",
      names_transform = list(year = as.integer)
    ) %>%
    dplyr::filter(!is.na(value)) %>%
    janitor::clean_names() %>%
    mutate(across(where(is.character), \(x) gsub(" ", "_", tolower(x)))) %>%

maurolepore commented 3 years ago

As we changed the dataset, this issue is no longer relevant.

RMI-PACTA / scenarioSelector

Include necessary data #3