In istr_product_level, if inputs have multiple sectors in different rows, paste all in the same row

Tilmon commented 1 year ago

Hi @AnneSchoenauer ,

I want you to confirm one thing: In contrast to the PSTR results, the ISTR results actually show multiple sectors for the same input in separate rows. This leads to the result that the same input can have e.g. 12 rows, because it has 2 scenariosx 2 yearsx 3 tilt_sectors. This is I think a consequence of 1-to-many relationships in the sector mappers.

Do we agree, that we can keep it that way for now and look into this again for v2?

See reprex below for example data at the end of one input with 12 rows.

UPDATE

istr_product_level: In ISTR, the same input can be assigned to multiple tilt_sectors and tilt_subsectors. This leads to cases like the one depicted here: https://github.com/2DegreesInvesting/tiltIndicator/issues/404. Even though the input shown in the ticket multiple sectors, the results are the same for each sector, because the underlying scenario sectors are the same. Therefore, if possible, we should paste all tilt_sector and tilt_subsector for these cases into one row. If that's possible, can you also pls check if this would affect company-level results?

devtools::load_all()
#> Error in `value[[3L]]()`:
#> ! Could not find a root 'DESCRIPTION' file that starts with '^Package'
#>   in 'C:\Users\Tilman\2° Investing Dropbox\Tilman Trompke\2° Investing
#>   Team\People\Tilman\1in1000\TILT\database\STR_review'.
#> ℹ Are you in your project directory and does your project have a 'DESCRIPTION'
#>   file?
#> Backtrace:
#>     ▆
#>  1. └─devtools::load_all()
#>  2.   └─pkgload::load_all(...)
#>  3.     └─pkgload::pkg_path(path)
#>  4.       └─base::tryCatch(...)
#>  5.         └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  6.           └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  7.             └─value[[3L]](cond)
#>  8.               └─cli::cli_abort(msg, class = "pkgload_no_desc")
#>  9.                 └─rlang::abort(...)
#Please choose the library depending on your code
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(readr)
library(here)
#> here() starts at C:/Users/Tilman/2° Investing Dropbox/Tilman Trompke/2° Investing Team/People/Tilman/1in1000/TILT/database/STR_review
packageVersion("tiltIndicator")
#> [1] '0.0.0.9064'
#### code

# Load
istr_product_level <- read_csv(here("istr_results_product_level.csv"))
#> Rows: 20202233 Columns: 12
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (11): companies_id, grouped_by, risk_category, clustered, activity_uuid_...
#> dbl  (1): year
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Data prep
# Rows where `risk_category` is not `NA` from companies with at least one such `risk_category`
filtered_complete_data <- istr_product_level |>
  group_by(companies_id) |>
  # Drop companies where all `risk_category` is `NA`
  filter(!all(is.na(risk_category))) |>
  # Drop rows where `risk_category` is `NA`
  filter(!is.na(risk_category)) |>
  ungroup()

# In companies where all `risk_category` is `NA`, make `type` & `risk_category` also `NA`
filtered_incomplete_data <- istr_product_level |>
  group_by(companies_id) |>
  # Where all `risk_category` is `NA` ...
  filter(all(is.na(risk_category))) |>
  # ... pick the first row and ...
  slice(1) |>
  # ... make `type` and `risk_category` also `NA`
  mutate(type = NA_character_, risk_category = NA_character_) |>
  ungroup()

istr_product_level3 <- bind_rows(filtered_complete_data, filtered_incomplete_data)

rm(filtered_complete_data)
rm(filtered_incomplete_data)
rm(istr_product_level)

# In the ISTR, we have inputs assigned to multiple sectors, which gives us e.g. in this case 3 different results for the same input. That's how we also wanted to have for PSTR (which didn't work because wrong data prep). 
istr_reprex1 <- istr_product_level3 |> 
  filter(companies_id == "07-schanksysteme-gmbh_00000005076936-001", 
         activity_uuid_product_uuid == "09516d9a-bf2d-558a-b403-ce7f089dee13_1fa64d0c-afd7-46ab-b95c-3a54e0902dd0", 
         input_activity_uuid_product_uuid == "141e871c-60db-5929-9f54-ea1d4f44018c_1fa64d0c-afd7-46ab-b95c-3a54e0902dd0"
  ) |> 
  relocate(companies_id, grouped_by, clustered, input_tilt_subsector)

istr_reprex1
#> # A tibble: 12 × 12
#>    companies_id          grouped_by clustered input_tilt_subsector risk_category
#>    <chr>                 <chr>      <chr>     <chr>                <chr>        
#>  1 07-schanksysteme-gmb… ipr_1.5c … drinks, … other                low          
#>  2 07-schanksysteme-gmb… ipr_1.5c … drinks, … other                high         
#>  3 07-schanksysteme-gmb… weo_nz 20… drinks, … other                low          
#>  4 07-schanksysteme-gmb… weo_nz 20… drinks, … other                high         
#>  5 07-schanksysteme-gmb… ipr_1.5c … drinks, … raw minerals         low          
#>  6 07-schanksysteme-gmb… ipr_1.5c … drinks, … raw minerals         high         
#>  7 07-schanksysteme-gmb… weo_nz 20… drinks, … raw minerals         low          
#>  8 07-schanksysteme-gmb… weo_nz 20… drinks, … raw minerals         high         
#>  9 07-schanksysteme-gmb… ipr_1.5c … drinks, … other metals         low          
#> 10 07-schanksysteme-gmb… ipr_1.5c … drinks, … other metals         high         
#> 11 07-schanksysteme-gmb… weo_nz 20… drinks, … other metals         low          
#> 12 07-schanksysteme-gmb… weo_nz 20… drinks, … other metals         high         
#> # ℹ 7 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> #   scenario <chr>, year <dbl>, type <chr>,
#> #   input_activity_uuid_product_uuid <chr>, input_tilt_sector <chr>

^{Created on 2023-06-05 with reprex v2.0.2}

cc @kalashsinghal @maurolepore

Tilmon commented 1 year ago

@AnneSchoenauer copying my answer from Slack here as well: @Anne Schoenauer the reason is that sometimes we have for the same combination of weo_sector, weo_subsector, ipr_sector, ipr_subsectormultiple rows of tilt_sectorand tilt_subsector. Check out the current tilt-scenario-mapper here: https://docs.google.com/spreadsheets/d/1ER0vFyyLan6OB3kG6r40-oJq1SUotk5-cvtVksnzwT4/edit#gid=1431070032 The reason for that is that weo, ipr, and tilt has different levels of granularity. This clearly shows that we need to re-calibrate the sector mappers for the next version. After this learning we might want to optimize it in a way that tilt sector are less granular than weo and ipr so that a weo-ipr combination never leads more than 1 tilt sector. For this version, the following info is important to know: Even though the input shown in the ticket above has multiple sectors, the results are the same for each sector, because the underlying scenario sectors are the same. I.e., we can either leave it as it is aggregate tilt_sectorand tilt_subsectorsinto one row for these cases What do you think?

Tilmon commented 1 year ago

@maurolepore I used the following piece of code for a similar problem where I basically "summarized" and "pasted" rows into one row. Don't have the time to modify parameteres, but maybe still useful:

ptr_products_raw_avg <- ptr_products_raw_avg |> 
  group_by(country, main_activity, clustered) |> 
  summarise(across(everything(), ~ paste0(na.omit(unique(.)), collapse = "; "))) |>
  mutate(co2_eq_kg_ipcc2021_gwp100 = if_else(multi_match == TRUE, avg_co2_footprint, co2_eq_kg_ipcc2021_gwp100)) |>
  select(-avg_co2_footprint) |>
  ungroup() |> 
  distinct()

Tilmon commented 1 year ago

Update: Issue resolved in new input data I provided to @maurolepore and @kalashsinghal. Therefore, I close this issue as done.

2DegreesInvesting / tiltIndicator

In istr_product_level, if inputs have multiple sectors in different rows, paste all in the same row #404