Closed maurolepore closed 4 months ago
Thanks, Mauro.
as01. @Tilmon quick question if we have a product that can only be matched to IEA but not to IPR do we preserve this as well? If not shouldn't it be here as well?
as01. @AnneSchoenauer here I adapted the reprex to show the example when the unmatched product results from a mismatch in the type
of scenario. I hope this helps in making concrete the conversation with Tilman.
Note the companies
dataset has a product with activity_uuid_product_uuid = "a"
both for type = "ipr"
and also type = "iea"
but the sector
and subsector
in that company are such that this specific product matches the scenario
dataset only where type = "iea"
(it lacks type = "ipr"
for that combination of sector
, subsector
, and year
).
library(tibble)
devtools::load_all()
#> ℹ Loading tiltIndicator
options(tibble.print_max = Inf, width = 500)
companies <- tribble(
~companies_id, ~clustered, ~activity_uuid_product_uuid, ~tilt_sector, ~tilt_subsector, ~type, ~sector, ~subsector,
"a", "a", "a", "a", "a", "ipr", "total", "energy",
"a", "a", "a", "a", "a", "iea", "total", "energy",
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", "2050", "1", "iea", "a"
)
result <- sector_profile(companies, scenarios)
result |> unnest_product()
#> # A tibble: 2 × 11
#> companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid tilt_sector scenario year type tilt_subsector
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a <NA> <NA> <NA> a a a <NA> <NA> ipr a
#> 2 a iea_a_2050 high 1 a a a a 2050 iea a
result |> unnest_company()
#> # A tibble: 4 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a iea_a_2050 high 0.5
#> 2 a iea_a_2050 medium 0
#> 3 a iea_a_2050 low 0
#> 4 a iea_a_2050 <NA> 0.5
ml01. Note the output at company level is similar to the new output of emissions*()
but it seems incorrect. @AnneSchoenauer and @Tilmon could you "draw" the ideal output for this particular case?
# ml01.1. Bad?
result |> unnest_company()
#> # A tibble: 4 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a iea_a_2050 high 0.5
#> 2 a iea_a_2050 medium 0
#> 3 a iea_a_2050 low 0
#> 4 a iea_a_2050 <NA> 0.5 # <- this seems wrong because the `NA` comes not from "iea" but from "ipr".
# ml01.2. Better?
result |> unnest_company()
#> # A tibble: 4 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a iea_a_2050 high 0.5
#> 2 a iea_a_2050 medium 0
#> 3 a iea_a_2050 low 0
#> 4 a <NA> <NA> 0.5 # <- this seems better (similar to Tilman's idea of the new `NA` (or `no_match`) benchmark
Dear @maurolepore ,
thanks for providing these insightful reprexes.
ml01. Note the output at company level is similar to the new output of emissions*() but it seems incorrect. @AnneSchoenauer and @Tilmon could you "draw" the ideal output for this particular case?
Actually, I think it's fine. Or moreover, it is what we need and want. It may seem a bit odd because in your reprex, you only use one scenario instead of both. When using both scenarios, one will see the 0.5 NAs in both grouped_by
(or in the real data in the 4 grouped_by, because IPR 2030, IPR 2050, WEO 2030, WEO 2050) , in the same way as it is in the emission_profile with the 6 grouped_by
.
I created a slightly extended and more realistic sample dataset in this Google Sheet (my reprex skills have not changed since last week, hence this is the only way for me to share tables with that level of detail with you, but willing to learn reprexes as discussed today!) which contains three clustered
where
You'll see in the results that, similar to the emission_profile
:
grouped_by
NA for the product without any resultsgrouped_by
we already have and simply add the risk_category
NA for the products without results (either because of missing sector-match to scenario or because of no sector data of the product at all). grouped_by
indicates the share of products without any results because no sector data available at allPlease not:
clustered
with a tilt_sector
and tilt_subsector
but that doesn't have a corresponding sector
or subsector
for either of the scenario type
tilt_sector
and tilt_subsector
at all, because in that case, we won't be able to find any matching scenario and hence will only have NAs for that product. cc' @AnneSchoenauer
Thanks @Tilmon for your expample (here) and for this explanation"
The "unmatched product" from the emission_profile is the equivalent to NOT having a tilt_sector and tilt_subsector at all.
This PR focuses on the "unmatched products" case exclusively. I took the data from your spreadsheet and picked only the relevant rows (note tibble::tribble()
helps create and share data in using a spreadsheet-like format).
The output is a little different because but it seems to makes sense considering the input data excluded the rows that belong to the case with a "missing benchmark" (#739 ). Just in case please confirm.
library(tibble)
devtools::load_all()
#> ℹ Loading tiltIndicator
packageVersion("tiltIndicator")
#> [1] '0.0.0.9210'
companies <- tribble(
~companies_id, ~clustered, ~activity_uuid_product_uuid, ~tilt_sector, ~tilt_subsector, ~type, ~sector, ~subsector,
"a", "a", "a", "a", "a", "ipr", "total", "energy",
"a", "a", "a", "a", "a", "weo", "total", "energy",
"a", "b", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched"
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", 2050, 1, "ipr", "a",
"total", "energy", 2050, 0.6, "weo", "a"
)
result <- sector_profile(companies, scenarios)
result |> unnest_product()
#> # A tibble: 3 × 11
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a ipr_a_2050 high 1 a
#> 2 a weo_a_2050 medium 0.6 a
#> 3 a <NA> <NA> NA b
#> # ℹ 6 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>, tilt_subsector <chr>
result |> unnest_company()
#> # A tibble: 8 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a ipr_a_2050 high 0.5
#> 2 a ipr_a_2050 medium 0
#> 3 a ipr_a_2050 low 0
#> 4 a ipr_a_2050 <NA> 0.5
#> 5 a weo_a_2050 high 0
#> 6 a weo_a_2050 medium 0.5
#> 7 a weo_a_2050 low 0
#> 8 a weo_a_2050 <NA> 0.5
As you consider this "is what we need and want" I'll polish this PR then extend it in #739 to include the case with a "missing benchmark". When that case is done I'll be able to use your full example and should get the same result.
The "unmatched product" from the emission_profile is the equivalent to NOT having a
tilt_sector
andtilt_subsector
at all, because in that case, we won't be able to find any matching scenario and hence will only have NAs for that product. --@Tilmon
@Tilmon, FYI I just notice that this conceptual truth can be untrue.
The reprex below shows that "unmatched" values in tilt_sector
and tilt_subsector
alone do not yield NA
s. Instead what drives the NA
are "unmatched" values in either sector
, subsector
and type
.
reprex
library(tibble)
devtools::load_all()
#> ℹ Loading tiltIndicator
packageVersion("tiltIndicator")
#> [1] '0.0.0.9210'
withr::local_options(list(tibble.print_max = Inf, width = 500))
# An "unmatched" value in `tilt_sector` or `tilt_subsector` does NOT yield `NA`
companies <- tribble(
~companies_id, ~activity_uuid_product_uuid, ~clustered, ~tilt_sector, ~sector, ~subsector, ~tilt_subsector, ~type,
"a", "a", "a", "total", "total", "energy", "a", "ipr",
"b", "b", "b", "unmatched", "total", "energy", "a", "ipr"
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", "2050", "1", "ipr", "a"
)
sector_profile(companies, scenarios) |> unnest_product()
#> # A tibble: 2 × 11
#> companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid tilt_sector scenario year type tilt_subsector
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a ipr_a_2050 high 1 a a total a 2050 ipr a
#> 2 b ipr_a_2050 high 1 b b unmatched a 2050 ipr a
# What does yield `NA` is an "unmatched" value in `sector` or `subsector`.
companies <- tribble(
~companies_id, ~activity_uuid_product_uuid, ~clustered, ~tilt_sector, ~sector, ~subsector, ~tilt_subsector, ~type,
"a", "a", "a", "total", "total", "energy", "a", "ipr",
"b", "b", "b", "total", "unmatched", "energy", "a", "ipr"
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", "2050", "1", "ipr", "a"
)
sector_profile(companies, scenarios) |> unnest_product()
#> # A tibble: 2 × 11
#> companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid tilt_sector scenario year type tilt_subsector
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a ipr_a_2050 high 1 a a total a 2050 ipr a
#> 2 b <NA> <NA> <NA> b b total <NA> <NA> ipr a
# Or in `type`
companies <- tribble(
~companies_id, ~activity_uuid_product_uuid, ~clustered, ~tilt_sector, ~sector, ~subsector, ~tilt_subsector, ~type,
"a", "a", "a", "total", "total", "energy", "a", "ipr",
"b", "b", "b", "total", "total", "energy", "a", "unmatched"
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", "2050", "1", "ipr", "a"
)
sector_profile(companies, scenarios) |> unnest_product()
#> # A tibble: 2 × 11
#> companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid tilt_sector scenario year type tilt_subsector
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a ipr_a_2050 high 1 a a total a 2050 ipr a
#> 2 b <NA> <NA> <NA> b b total <NA> <NA> unmatched a
I suspect your statement is true in real practice, likely because in the real data tilt_sector
or tilt_subsector
should always be "unmatched" when sector
, subsector
or type
are unmatched. But currently the code does not know about this relationship. If it is an important one, please confirm and I'll open an issue to encode it in a warning or error.
If instead this suggests a bug let me know so we fix it.
@maurolepore thanks, you are right!
I suspect your statement is true in real practice, likely because in the real data tilt_sector or tilt_subsector should always be "unmatched" when sector, subsector or type are unmatched. But currently the code does not know about this relationship. If it is an important one, please confirm and I'll open an issue to encode it in a warning or error.
That's also correct. We always start with a tilt_sector
for each product in the data prep. Every tilt_sector
leads to at least one sector
(either ipr or weo or both). If we don't have a tilt_sector
, we don't have a sector
. But I agree that in this case the unmatched value in sector
is the important relationship.
Thanks for double checking.
Closes #733 Extends #639
--
@Tilmon and @AnneSchoenauer please see the reprexes and let me know if this is what you expect or what needs to change.
sector_profile*()
now:value
at company level.sector_profile()
sector_profile_upstream()
TODO
EXCEPTIONS