RMI-PACTA / pacta.scenario.data.preparation

The goal of {pacta.scenario.data.preparation} is to prepare and format all scenario input datasets required to run the {pacta.portfolio.allocate} tool.
https://rmi-pacta.github.io/pacta.scenario.data.preparation/
Other
1 stars 0 forks source link

find an automatic way of keeping scenario source labels up to date #11

Open jdhoffa opened 6 months ago

jdhoffa commented 6 months ago
          NIT: I think we may just want to functionalize this? But out of scope for this PR since it is a pure refactor

_Originally posted by @jdhoffa in https://github.com/RMI-PACTA/pacta.scenario.data.preparation/pull/6#discussion_r1494496338_

AB#10859

cjyetman commented 4 months ago

We're talking about these lines here? https://github.com/RMI-PACTA/pacta.scenario.data.preparation/blob/6d77aeb79cad080730d3474a76611120357ce288/R/utils.R#L43-L48

What is the suggestion? Something like...

p4i_label <- function() { c("WEO2022", "GECO2022") }
p4b_label <- function() { c("weo_2022", "geco_2022") }

Seems a bit overkill honestly

jdhoffa commented 4 months ago

No, I think the idea is more to create a function that maps: ABCDYYYY -> abcd_yyyy

The former being what P4I expects, the latter being what P4B expects.

(In general, my preference is to just align the expectation from both tools, and not deal with the reformatting at all)

cjyetman commented 2 months ago

relevant work was done in

cjyetman commented 2 months ago

There's already dictionary_p4i_p4b(), which was updated in #59 and #60 https://github.com/RMI-PACTA/pacta.scenario.data.preparation/blob/5c3925555358b13e06698a5b3fa30a912775a64c/R/utils.R#L43-L53

Is that enough, or did you want more? like...

dictionary_p4i_p4b <- function() {
  dplyr::tribble(
    ~p4i_label, ~p4b_label,
    "ISF2021", "isf_2021",
    "WEO2022", "weo_2022",
    "GECO2022", "geco_2022",
    "GECO2023", "geco_2023",
    "WEO2023", "weo_2023",
    "ISF2023", "isf_2023"
  )
}

convert_scenario_sources <-
  function(scenario_sources, label = c("p4i_label", "p4b_label")) {
    dict <- dictionary_p4i_p4b()
    to_label <- names(dict)[label != names(dict)]
    dict[[to_label]][match(scenario_sources, dict[[label]])]
  }

scenario_sources <- c("ISF2021", "WEO2022", "GECO2022")
convert_scenario_sources(scenario_sources, label = "p4i_label")
#> [1] "isf_2021"  "weo_2022"  "geco_2022"

scenario_sources <- c("isf_2021", "weo_2022", "geco_2022")
convert_scenario_sources(scenario_sources, label = "p4b_label")
#> [1] "ISF2021"  "WEO2022"  "GECO2022"
jdhoffa commented 2 months ago

The goal of this issue would be to create a function that programmatically maps e.g. ABC1234 -> abc_1234, so that we don't need to manually update the dictionary_p4i_p4b() function whenever we add a new scenario

cjyetman commented 2 months ago

@jdhoffa does this achieve what you were thinking of?

p4i_p4b <- function(x) {
  tolower(sub(pattern = "([:alpha:]*)(\\d{4})", replacement = "\\1_\\2", x = x))
}

p4b_p4i <- function(x) {
  toupper(sub(pattern = "([:alpha:]*)_(\\d{4})", replacement = "\\1\\2", x = x))
}

p4i_p4b("WEO2023")
#> [1] "weo_2023"

p4i_p4b("GECO2023")
#> [1] "geco_2023"

p4b_p4i("weo_2023")
#> [1] "WEO2023"

p4b_p4i("geco_2023")
#> [1] "GECO2023"
jdhoffa commented 2 months ago

It does! Then the next step would be to actually implement that function in any of the P4B processing functions