In PSTR the IPR scenarios for landuse should not be NA

maurolepore commented 1 year ago

@AnneSchoenauer said

If you look at this sheet https://docs.google.com/spreadsheets/d/1ViTkhidRxcMLqlakH9oN4POT2RvTP0mCDm2yoi1TCU4/edit you see that there are NAs for the land use sector. This shouldn’t be the case as the land use sector should be covered by the IPR scenarios.

https://github.com/2DegreesInvesting/tiltIndicator/issues/312#issuecomment-1546033719

@AnneSchoenauer also said

Dear @Tilmon and dear @maurolepore, I randomly checked now some companies and I noticed one issue which I however don't know if it is on purpose, if it is a technicality or if is content-related. Therefore, @maurolepore if you don't see that it is a code problem, I am afraid that we need to wait for @Tilmon to tell us what he thinks about the issue on Monday. I noticed that some of the companies don't have any sector and therefore land in the risk category "no_sector". However in this sample company here, we should have sectors namely the "Land Use" sector that should be matched with a sector reduction target of IPR. I didn't check if now all "Land Use" sectors are not matched to a scenario - so I don't know if the sample company is an excemption but this is, so far as I know, not correct. Other than that the outputs are great and look exactly of how we would like to see them. @maurolepore if you are nearly there with running the whole PSTR I would suggest that you continue to finish the outputs. So that if this issue here is not a real issue, @Tilmon can work with the output tables on Monday.

https://github.com/2DegreesInvesting/tiltIndicator/issues/316#issue-1708280334

maurolepore commented 1 year ago

@AnneSchoenauer and @Tilmon,

I think this reprex captures the issue that Anne describes.

pstr*() matches companies and scenarios by all of the columns type, sector, and subsector (via a simple left_join()). Rows where those three columns have identical data in both datasets will match, else they won't.

I start with a simple generic example. Below I show another one based on the company that Anne shoes in the googlesheet.

library(dplyr, warn.conflicts = FALSE)
library(tiltIndicator)
packageVersion("tiltIndicator")
#> [1] '0.0.0.9058'

# Basic datasets
companies <- tibble(
  type = "x",
  sector = "x",
  subsector = "x",
  # Irrelevant
  company_id = "x",
  clustered = "x",
  activity_uuid_product_uuid = "x",
  isic_4digit = "x",
  tilt_sector = "x",
  tilt_subsector = c("x", "x"),
)
scenarios <- tibble(
  type = "x",
  sector = "x",
  subsector = "x",
  # Irrelevant
  scenario = "x",
  year = 1,
  value = 1,
  reductions = 1:2,
)

# `pstr*()` match companies to scenarios by all of type, sector, and subsector
# Here they are all the same so they match
companies <- companies |> mutate(sector = c("land use", "industry"))
scenarios <- scenarios |> mutate(sector = c("land use", "industry"))
companies
#> # A tibble: 2 × 9
#>   type  sector subsector company_id clustered activity_uuid_produc…¹ isic_4digit
#>   <chr> <chr>  <chr>     <chr>      <chr>     <chr>                  <chr>      
#> 1 x     land … x         x          x         x                      x          
#> 2 x     indus… x         x          x         x                      x          
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: tilt_sector <chr>, tilt_subsector <chr>
scenarios
#> # A tibble: 2 × 7
#>   type  sector   subsector scenario  year value reductions
#>   <chr> <chr>    <chr>     <chr>    <dbl> <dbl>      <int>
#> 1 x     land use x         x            1     1          1
#> 2 x     industry x         x            1     1          2
pstr_at_product_level(companies, scenarios)
#> # A tibble: 2 × 10
#>   companies_id grouped_by risk_category clustered activity_uuid_product_uuid
#>   <chr>        <chr>      <chr>         <chr>     <chr>                     
#> 1 x            x_x_1      low           x         x                         
#> 2 x            x_x_1      low           x         x                         
#> # ℹ 5 more variables: tilt_sector <chr>, tilt_subsector <chr>, scenario <chr>,
#> #   year <dbl>, type <chr>
pstr(companies, scenarios)
#> # A tibble: 3 × 4
#>   companies_id grouped_by risk_category value
#>   <chr>        <chr>      <chr>         <dbl>
#> 1 x            x_x_1      high              0
#> 2 x            x_x_1      medium            0
#> 3 x            x_x_1      low               1

# Here they are NOT all the same so some don't match
companies <- companies |> mutate(sector = c("land use", "industry"))
scenarios <- scenarios |> mutate(sector = c("land use", "other"))
companies
#> # A tibble: 2 × 9
#>   type  sector subsector company_id clustered activity_uuid_produc…¹ isic_4digit
#>   <chr> <chr>  <chr>     <chr>      <chr>     <chr>                  <chr>      
#> 1 x     land … x         x          x         x                      x          
#> 2 x     indus… x         x          x         x                      x          
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: tilt_sector <chr>, tilt_subsector <chr>
scenarios
#> # A tibble: 2 × 7
#>   type  sector   subsector scenario  year value reductions
#>   <chr> <chr>    <chr>     <chr>    <dbl> <dbl>      <int>
#> 1 x     land use x         x            1     1          1
#> 2 x     other    x         x            1     1          2
pstr_at_product_level(companies, scenarios)
#> # A tibble: 2 × 10
#>   companies_id grouped_by risk_category clustered activity_uuid_product_uuid
#>   <chr>        <chr>      <chr>         <chr>     <chr>                     
#> 1 x            x_x_1      low           x         x                         
#> 2 x            x_NA_NA    no_sector     x         x                         
#> # ℹ 5 more variables: tilt_sector <chr>, tilt_subsector <chr>, scenario <chr>,
#> #   year <dbl>, type <chr>
pstr(companies, scenarios)
#> # A tibble: 6 × 4
#>   companies_id grouped_by risk_category value
#>   <chr>        <chr>      <chr>         <dbl>
#> 1 x            x_NA_NA    high             NA
#> 2 x            x_NA_NA    medium           NA
#> 3 x            x_NA_NA    low              NA
#> 4 x            x_x_1      high              0
#> 5 x            x_x_1      medium            0
#> 6 x            x_x_1      low               1

# Same here
companies <- companies |> mutate(sector = c("land use", "industry"))
scenarios <- scenarios |> mutate(sector = c("land use", NA))
companies
#> # A tibble: 2 × 9
#>   type  sector subsector company_id clustered activity_uuid_produc…¹ isic_4digit
#>   <chr> <chr>  <chr>     <chr>      <chr>     <chr>                  <chr>      
#> 1 x     land … x         x          x         x                      x          
#> 2 x     indus… x         x          x         x                      x          
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: tilt_sector <chr>, tilt_subsector <chr>
scenarios
#> # A tibble: 2 × 7
#>   type  sector   subsector scenario  year value reductions
#>   <chr> <chr>    <chr>     <chr>    <dbl> <dbl>      <int>
#> 1 x     land use x         x            1     1          1
#> 2 x     <NA>     x         x            1     1          2
pstr_at_product_level(companies, scenarios)
#> # A tibble: 2 × 10
#>   companies_id grouped_by risk_category clustered activity_uuid_product_uuid
#>   <chr>        <chr>      <chr>         <chr>     <chr>                     
#> 1 x            x_x_1      low           x         x                         
#> 2 x            x_NA_NA    no_sector     x         x                         
#> # ℹ 5 more variables: tilt_sector <chr>, tilt_subsector <chr>, scenario <chr>,
#> #   year <dbl>, type <chr>
pstr(companies, scenarios)
#> # A tibble: 6 × 4
#>   companies_id grouped_by risk_category value
#>   <chr>        <chr>      <chr>         <dbl>
#> 1 x            x_NA_NA    high             NA
#> 2 x            x_NA_NA    medium           NA
#> 3 x            x_NA_NA    low              NA
#> 4 x            x_x_1      high              0
#> 5 x            x_x_1      medium            0
#> 6 x            x_x_1      low               1

Here is the relevant data of the company that Anne shows in the google sheet. The tilt_ columns are here for reference but they don't play a role in the left_join() that causes this issue.

Selected columns from `companies` for the company that Anne shared (`thorsten-gerbitz_00000004924766-001`)

# A tibble: 5 × 5
  type  *sector*   subsector      tilt_sector      tilt_subsector                                 
  <chr> <chr>    <chr>          <chr>            <chr>                                          
1 ipr   land use NA             land use         NA                                             
2 ipr   land use NA             land use         NA                                             
3 ipr   land use NA             land use         NA                                             
4 ipr   industry other industry steel and metals other metals                                   
5 ipr   NA       NA             land use         fishing and forestry; agriculture and livestock

And here is bit of the scenarios that explains why the sector "land use" doesn't match. Note that the company above has "land use" in sector but the "ipr" scenario below has "land use" under subsector. This won't match via a left_join() by sector or subsector.

Distinct `type`, `sector`, and `subsector` from the `scenarios` dataset

# A tibble: 14 × 3
   type  sector       *subsector*            
   <chr> <chr>        <chr>                
 1 ipr   power        NA                   
 2 ipr   buildings    NA                   
 3 ipr   industry     iron and steel       
 4 ipr   industry     non-metallic minerals
 5 ipr   industry     chemicals            
 6 ipr   industry     other industry       
 7 ipr   transport    cars                 
 8 ipr   transport    trucks               
 9 ipr   transport    aviation             
10 ipr   transport    shipping             
11 ipr   transport    other transport      
12 ipr   other energy NA                   
13 ipr   total        energy               
14 ipr   NA           land use

maurolepore commented 1 year ago

.

AnneSchoenauer commented 1 year ago

Dear @maurolepore thanks a lot for this! This really helps and it is as I can judge a content problem - I will talk to @Tilmon tomorrow and ask him why this is the case and also investigate myself again the scenario mapper - I think at the end the scenario mapper might be wrong. Thanks again! All the best Anne

Tilmon commented 1 year ago

@kalashsinghal @maurolepore @AnneSchoenauer I will take care of this and inform Kalash once the new data are ready.

Tilmon commented 1 year ago

@maurolepore @kalashsinghal @AnneSchoenauer Issue is resolved and sectors updated in new data. Shared new data with Kalash in this ticket Adapt str_companies & re-run PSTR after Tilman fixed scenario sectors#356

2DegreesInvesting / tiltIndicator