expersso / OECD

Reproducible and programmatic access to OECD data
126 stars 20 forks source link

Source for current CRAN version (0.2.5) #25

Open Jonathan-Aron-LDN opened 1 year ago

Jonathan-Aron-LDN commented 1 year ago

The version of OECD on CRAN is 0.2.5 but this repo only has 0.2.4 https://cran.r-project.org/web/packages/OECD/index.html. The CRAN version also causes issue #24 due to the replacement of the rsdmx package with readsdmx in get_data_structure. Given the apparent speed benefits of readsdmx, the following function (or equivalent in base R) could be used to generate the same result as v0.2.4:

source("https://raw.githubusercontent.com/expersso/OECD/master/R/main.R")

get_data_structure_fixed <- function(dataset) {
  url <- paste0("https://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/", 
                dataset)

  data_structure <- readsdmx::read_sdmx(url) |>
    dplyr::mutate(id = gsub(paste0("CL_", dataset, "_"), "", id))

  code_list <- data_structure |>
    dplyr::select(id, value, label = en_description) |>
    split(factor(data_structure$id, levels = unique(data_structure$id))) |>
    purrr::map(
      \(x) dplyr::select(x, id = value, label) |> tibble::remove_rownames()
    )

  lookup <- tibble::enframe(c(
    OBS_VALUE = "Observation Value",
    TIME_FORMAT = "Time Format",
    UNIT = "Unit",
    POWERCODE = "Unit multiplier",
    REFERENCEPERIOD = "Reference period"
  ), name = "id", value = "description"
  ) |>
    dplyr::filter(id %in% names(code_list) | id == "OBS_VALUE")

  variable_desc <- data_structure |>
    dplyr::select(id, description = en) |>
    dplyr::distinct() |>
    dplyr::filter(!id %in% lookup$id) |>
    rbind(lookup)

  full_df_list <- c(VAR_DESC = list(variable_desc), code_list)

  full_df_list
}

test_data_structure <- function(dataset) {
  new <- get_data_structure_fixed(dataset)

  # From version 0.2.4
  ref <- get_data_structure(dataset)

  new$VAR_DESC <- new$VAR_DESC |> dplyr::arrange(id)
  ref$VAR_DESC <- ref$VAR_DESC |> dplyr::arrange(id)

  testthat::expect_identical(new, ref)
}

datasets <- c("GOV_DEBT", "DUR_D", "AIR_EMISSIONS", "TEL", "FUA_CITY")

for (ds in datasets) {
  test_data_structure(ds)
}
plukethep commented 1 year ago

it's very sad that this project seems to be abandoned. it's saved me a huge amount of time!