Closed maurolepore closed 1 year ago
@AnneSchoenauer and @Tilmon,
I think this reprex captures the issue that Anne describes.
pstr*()
matches companies
and scenarios
by all of the columns type
, sector
, and subsector
(via a simple left_join()
). Rows where those three columns have identical data in both datasets will match, else they won't.
I start with a simple generic example. Below I show another one based on the company that Anne shoes in the googlesheet.
library(dplyr, warn.conflicts = FALSE)
library(tiltIndicator)
packageVersion("tiltIndicator")
#> [1] '0.0.0.9058'
# Basic datasets
companies <- tibble(
type = "x",
sector = "x",
subsector = "x",
# Irrelevant
company_id = "x",
clustered = "x",
activity_uuid_product_uuid = "x",
isic_4digit = "x",
tilt_sector = "x",
tilt_subsector = c("x", "x"),
)
scenarios <- tibble(
type = "x",
sector = "x",
subsector = "x",
# Irrelevant
scenario = "x",
year = 1,
value = 1,
reductions = 1:2,
)
# `pstr*()` match companies to scenarios by all of type, sector, and subsector
# Here they are all the same so they match
companies <- companies |> mutate(sector = c("land use", "industry"))
scenarios <- scenarios |> mutate(sector = c("land use", "industry"))
companies
#> # A tibble: 2 × 9
#> type sector subsector company_id clustered activity_uuid_produc…¹ isic_4digit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 x land … x x x x x
#> 2 x indus… x x x x x
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: tilt_sector <chr>, tilt_subsector <chr>
scenarios
#> # A tibble: 2 × 7
#> type sector subsector scenario year value reductions
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <int>
#> 1 x land use x x 1 1 1
#> 2 x industry x x 1 1 2
pstr_at_product_level(companies, scenarios)
#> # A tibble: 2 × 10
#> companies_id grouped_by risk_category clustered activity_uuid_product_uuid
#> <chr> <chr> <chr> <chr> <chr>
#> 1 x x_x_1 low x x
#> 2 x x_x_1 low x x
#> # ℹ 5 more variables: tilt_sector <chr>, tilt_subsector <chr>, scenario <chr>,
#> # year <dbl>, type <chr>
pstr(companies, scenarios)
#> # A tibble: 3 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 x x_x_1 high 0
#> 2 x x_x_1 medium 0
#> 3 x x_x_1 low 1
# Here they are NOT all the same so some don't match
companies <- companies |> mutate(sector = c("land use", "industry"))
scenarios <- scenarios |> mutate(sector = c("land use", "other"))
companies
#> # A tibble: 2 × 9
#> type sector subsector company_id clustered activity_uuid_produc…¹ isic_4digit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 x land … x x x x x
#> 2 x indus… x x x x x
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: tilt_sector <chr>, tilt_subsector <chr>
scenarios
#> # A tibble: 2 × 7
#> type sector subsector scenario year value reductions
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <int>
#> 1 x land use x x 1 1 1
#> 2 x other x x 1 1 2
pstr_at_product_level(companies, scenarios)
#> # A tibble: 2 × 10
#> companies_id grouped_by risk_category clustered activity_uuid_product_uuid
#> <chr> <chr> <chr> <chr> <chr>
#> 1 x x_x_1 low x x
#> 2 x x_NA_NA no_sector x x
#> # ℹ 5 more variables: tilt_sector <chr>, tilt_subsector <chr>, scenario <chr>,
#> # year <dbl>, type <chr>
pstr(companies, scenarios)
#> # A tibble: 6 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 x x_NA_NA high NA
#> 2 x x_NA_NA medium NA
#> 3 x x_NA_NA low NA
#> 4 x x_x_1 high 0
#> 5 x x_x_1 medium 0
#> 6 x x_x_1 low 1
# Same here
companies <- companies |> mutate(sector = c("land use", "industry"))
scenarios <- scenarios |> mutate(sector = c("land use", NA))
companies
#> # A tibble: 2 × 9
#> type sector subsector company_id clustered activity_uuid_produc…¹ isic_4digit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 x land … x x x x x
#> 2 x indus… x x x x x
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: tilt_sector <chr>, tilt_subsector <chr>
scenarios
#> # A tibble: 2 × 7
#> type sector subsector scenario year value reductions
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <int>
#> 1 x land use x x 1 1 1
#> 2 x <NA> x x 1 1 2
pstr_at_product_level(companies, scenarios)
#> # A tibble: 2 × 10
#> companies_id grouped_by risk_category clustered activity_uuid_product_uuid
#> <chr> <chr> <chr> <chr> <chr>
#> 1 x x_x_1 low x x
#> 2 x x_NA_NA no_sector x x
#> # ℹ 5 more variables: tilt_sector <chr>, tilt_subsector <chr>, scenario <chr>,
#> # year <dbl>, type <chr>
pstr(companies, scenarios)
#> # A tibble: 6 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 x x_NA_NA high NA
#> 2 x x_NA_NA medium NA
#> 3 x x_NA_NA low NA
#> 4 x x_x_1 high 0
#> 5 x x_x_1 medium 0
#> 6 x x_x_1 low 1
Here is the relevant data of the company that Anne shows in the google sheet. The tilt_
columns are here for reference but they don't play a role in the left_join()
that causes this issue.
companies
for the company that Anne shared (thorsten-gerbitz_00000004924766-001
)# A tibble: 5 × 5
type *sector* subsector tilt_sector tilt_subsector
<chr> <chr> <chr> <chr> <chr>
1 ipr land use NA land use NA
2 ipr land use NA land use NA
3 ipr land use NA land use NA
4 ipr industry other industry steel and metals other metals
5 ipr NA NA land use fishing and forestry; agriculture and livestock
And here is bit of the scenarios
that explains why the sector "land use" doesn't match. Note that the company above has "land use" in sector
but the "ipr" scenario below has "land use" under subsector. This won't match via a left_join()
by sector or subsector.
type
, sector
, and subsector
from the scenarios
dataset# A tibble: 14 × 3
type sector *subsector*
<chr> <chr> <chr>
1 ipr power NA
2 ipr buildings NA
3 ipr industry iron and steel
4 ipr industry non-metallic minerals
5 ipr industry chemicals
6 ipr industry other industry
7 ipr transport cars
8 ipr transport trucks
9 ipr transport aviation
10 ipr transport shipping
11 ipr transport other transport
12 ipr other energy NA
13 ipr total energy
14 ipr NA land use
.
Dear @maurolepore thanks a lot for this! This really helps and it is as I can judge a content problem - I will talk to @Tilmon tomorrow and ask him why this is the case and also investigate myself again the scenario mapper - I think at the end the scenario mapper might be wrong. Thanks again! All the best Anne
@kalashsinghal @maurolepore @AnneSchoenauer I will take care of this and inform Kalash once the new data are ready.
@maurolepore @kalashsinghal @AnneSchoenauer Issue is resolved and sectors updated in new data. Shared new data with Kalash in this ticket Adapt str_companies & re-run PSTR after Tilman fixed scenario sectors#356
@AnneSchoenauer said
https://github.com/2DegreesInvesting/tiltIndicator/issues/312#issuecomment-1546033719
@AnneSchoenauer also said
https://github.com/2DegreesInvesting/tiltIndicator/issues/316#issue-1708280334