Define dictionary columns

maurolepore commented 1 month ago

Follows up #11
May reuse this googlesheet -- the latest version of data dictionary from the logical data model. It may be outdated and focused on inputs (not outputs).

--

If you would like to assign me this task then please create a separate ticket for me and adjust the priority of this task on my board! Thanks! :) -- @kalashsinghal in https://github.com/2DegreesInvesting/tiltWebTool/issues/11#issuecomment-2133632701

@kalashsinghal, this is the ticket, and the attached .csv files were generated as you say -- "from profile_emissions and profile_sector functions of tiltIndicatorAfter".

reprex

## Emissions ``` r sector <- readr::read_csv("tiltIndicatorAfter-v0.0.0.9040-emissions.csv") #> Rows: 42 Columns: 5 #> ── Column specification ──────────────────────────────────────────────────────── #> Delimiter: "," #> chr (4): dataset, level, name, type #> lgl (1): definition #> #> ℹ Use `spec()` to retrieve the full column specification for this data. #> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` ``` r head(sector) #> # A tibble: 6 × 5 #> dataset level name type definition #> #> 1 emissions product companies_id character NA #> 2 emissions product company_name character NA #> 3 emissions product country character NA #> 4 emissions product emission_profile character NA #> 5 emissions product benchmark character NA #> 6 emissions product ep_product character NA ``` ``` r tail(sector) #> # A tibble: 6 × 5 #> dataset level name type definition #> #> 1 emissions company company_city character NA #> 2 emissions company postcode character NA #> 3 emissions company address character NA #> 4 emissions company main_activity character NA #> 5 emissions company profile_ranking_avg double NA #> 6 emissions company co2_avg double NA ``` ## Sector ``` r sector <- readr::read_csv("tiltIndicatorAfter-v0.0.0.9040-sector.csv") #> Rows: 39 Columns: 5 #> ── Column specification ──────────────────────────────────────────────────────── #> Delimiter: "," #> chr (4): dataset, level, name, type #> lgl (1): definition #> #> ℹ Use `spec()` to retrieve the full column specification for this data. #> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` ``` r head(sector) #> # A tibble: 6 × 5 #> dataset level name type definition #> #> 1 sector product companies_id character NA #> 2 sector product company_name character NA #> 3 sector product country character NA #> 4 sector product sector_profile character NA #> 5 sector product reduction_targets double NA #> 6 sector product scenario character NA ``` ``` r tail(sector) #> # A tibble: 6 × 5 #> dataset level name type definition #> #> 1 sector company matching_certainty_company_average character NA #> 2 sector company company_city character NA #> 3 sector company postcode character NA #> 4 sector company address character NA #> 5 sector company main_activity character NA #> 6 sector company reduction_targets_avg double NA ```

Here is how I did it. The trick is to use [this helper](https://2degreesinvesting.github.io/tiltDevTools/reference/extensions.html). ## Emissions ```r library(readr, warn.conflicts = FALSE) library(tiltToyData) library(tiltIndicatorAfter) library(tiltDevTools) companies <- read_csv(toy_emissions_profile_any_companies()) products <- read_csv(toy_emissions_profile_products_ecoinvent()) europages_companies <- read_csv(toy_europages_companies()) ecoinvent_activities <- read_csv(toy_ecoinvent_activities()) ecoinvent_europages <- read_csv(toy_ecoinvent_europages()) isic_name <- read_csv(toy_isic_name()) emissions <- profile_emissions( companies, products, europages_companies = europages_companies, ecoinvent_activities = ecoinvent_activities, ecoinvent_europages = ecoinvent_europages, isic = isic_name ) version <- packageVersion("tiltIndicatorAfter") emissions |> use_dictionary() |> write_csv(glue::glue("tiltIndicatorAfter-v{version}-emissions.csv")) ``` ## Sector ```r library(readr, warn.conflicts = FALSE) library(tiltToyData) library(tiltIndicatorAfter) library(tiltDevTools) companies <- read_csv(toy_sector_profile_companies()) scenarios <- read_csv(toy_sector_profile_any_scenarios()) europages_companies <- read_csv(toy_europages_companies()) |> head(3) ecoinvent_activities <- read_csv(toy_ecoinvent_activities()) |> head(3) ecoinvent_europages <- read_csv(toy_ecoinvent_europages()) |> head(3) isic_name <- read_csv(toy_isic_name()) |> head(3) sector <- profile_sector( companies, scenarios, europages_companies = europages_companies, ecoinvent_activities = ecoinvent_activities, ecoinvent_europages = ecoinvent_europages, isic = isic_name ) version <- packageVersion("tiltIndicatorAfter") sector |> use_dictionary() |> write_csv(glue::glue("tiltIndicatorAfter-v{version}-sector.csv")) ```

Tilmon commented 1 month ago

Hi @kalashsinghal thanks for your support here. Let's prioritize tomorrow in the sprint. I'm also still waiting for response from Yana, who (I believe) has worked on something similar in the past. Maybe we can use that as input.

EDIT 2024-05-28: Here is the latest version of data dictionary from the logical data model. It's not really up-to-date and mainly contains input data, not the output data, so probably not a big help :)

Tilmon commented 1 month ago

Addendum: Here is a Wikipedia article about data dictionaries. It lists out possible data attributes that are listed in such dictionairies. I think Mauro's template already captures the most relevant ones, but probably good to think about whether we think additional columns could be useful. For filling out the column "definition" in Mauro's template, we can also utilize the documentation in tiltIndicator and tiltIndicatorAfter which already defines many of the columns.

maurolepore commented 1 month ago

@kalashsinghal and @Tilmon please edit this googlesheet

https://docs.google.com/spreadsheets/d/1gOZRS9_0yUgR7UXgsf4WmDoAEXfUQy3Mz0MZM1Xvv40/edit#gid=105958234

Tilmon commented 1 month ago

@maurolepore please see this comment for the data dictionary from the DT, just for reference: https://github.com/2DegreesInvesting/tiltWebTool/issues/12#issuecomment-2133975210

I'll work on it next week.

2DegreesInvesting / tiltWebTool

Define dictionary columns #12