Closed maurolepore closed 1 month ago
@Tilmon (cc' @AnneSchoenauer)
I'm struggling to see the difference between the case "unmatched products" and the case "missing benchmarks".
For emissions*()
I could clearly understand the difference and and create different tests for the cases with "unmatched product" versus "missing benchmark". For example, to test the case "unmatched product" I could add a product in the companies
dataset that did not exist in the products
dataset. And to test the case "missing benchmark" I could add an NA
in the isic_4digit
column of the products
dataset.
However, for sector*()
I can't understand the difference as clearly, and I realize that my tests for both cases essentially use the same type of toy input data: An NA
or "unmatched" in either the columns sector
, subsector
, or type
of the companies
dataset.
Your https://github.com/2DegreesInvesting/tiltIndicator/pull/738#issuecomment-1975997275 seems to support the idea that these two cases are no different:
I agree that in this case the unmatched value in sector is the important relationship.
The best way to clarify things is through a reproducible example that you are familiar with because it's based on the one you created in this GoogleSheet. At this point I'm so confused that I don't know if you think it shows one or both cases, but I hope it's a good start.
reprex
library(tibble)
devtools::load_all()
#> ℹ Loading tiltIndicator
packageVersion("tiltIndicator")
#> [1] '0.0.0.9211'
withr::local_options(list(tibble.print_max = Inf, width = 500))
companies <- tribble(
~companies_id, ~clustered, ~activity_uuid_product_uuid, ~tilt_sector, ~tilt_subsector, ~type, ~sector, ~subsector,
"a", "a", "a", "a", "a", "ipr", "total", "energy",
"a", "a", "a", "a", "a", "weo", "total", "energy",
"a", "b", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched",
"a", "c", "unmatched", "c", "c", "ipr", "land use", "land use",
"a", "c", "unmatched", "c", "c", "weo", NA, NA
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", 2050, 1.0, "ipr", "a",
"total", "energy", 2050, 0.6, "weo", "a",
"land use", "land use", 2050, 0.3, "ipr", "a"
)
sector_profile(companies, scenarios) |> unnest_product()
#> # A tibble: 5 × 11
#> companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid tilt_sector scenario year type tilt_subsector
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 a ipr_a_2050 high 1 a a a a 2050 ipr a
#> 2 a weo_a_2050 medium 0.6 a a a a 2050 weo a
#> 3 a <NA> <NA> NA b unmatched unmatched <NA> NA unmatched unmatched
#> 4 a ipr_a_2050 low 0.3 c unmatched c a 2050 ipr c
#> 5 a <NA> <NA> NA c unmatched c <NA> NA weo c
sector_profile(companies, scenarios) |> unnest_company()
#> # A tibble: 8 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a ipr_a_2050 high 0.25
#> 2 a ipr_a_2050 medium 0
#> 3 a ipr_a_2050 low 0.25
#> 4 a ipr_a_2050 <NA> 0.5
#> 5 a weo_a_2050 high 0
#> 6 a weo_a_2050 medium 0.333
#> 7 a weo_a_2050 low 0
#> 8 a weo_a_2050 <NA> 0.667
It it possible that this is already all you want? If yes, we're done and I can close this PR. If not, what do you expect?
Hi @maurolepore ,
let me respond to the points you raised below. I first start with a more detailed description of how to think about the two different cases and then explain why I think the reprex is unfortunately wrong.
Description of the "two cases"
I'm struggling to see the difference between the case "unmatched products" and the case "missing benchmarks".
I must admit that I find the difference in the case of sector*()
also more complicated, but it's still there. Let's think about it that way: For the sector*()
it doesn't matter, whether a clustered
is matched to Ecoinvent or not. What matters is:
clustered
has a tilt_sector
. If it doesn't have a tilt_sector
, the clustered
will neither have a sector
or subsector
for any of the types
. Hence, you can think of the tilt_sector
as the activity_uuid_product_uuid
of sector*()
: If the tilt_sector
(activity_uuid_product_uuid
) is missing, you won't have any sector
data (co2_footprint
) that can be used to calculate the profile. Because ultimately, the scenario sectors give us the info on the reduction targets. I.e., no scenario sector = no result at all. Hence, tilt_sector
== unmatched
should be have as the activity_uuid_product_uuid
== unmatched in the emission*(). tilt_sector
& tilt_subsector
leads to a sector
& subsector
for either of the type
ipr or weo or both or none. You can think of the sector
x subsector
x type
x year
combination as the 6 benchmarks in the emission*()
. For each clustered
with a tilt_sector
, we want to show all benchmarks, i.e. all corresponding combinations of sector
x subsector
x type
x year
for the specific tilt_sector
x tilt_subsector
, even if some are NAs. I.e., for every clustered
with a tilt_sector
, we should show the benchmarks weo_2030, weo_2050, ipr_2030, ipr_2050, even if some are NA (as in your reprex clustered
c has no corresponding weo sector) - similar to an NA in isic_4digit
in emission_profile()
.Reprex Your reprex shows BOTH cases.
clustered
b has no tilt_sector
(equivalent of no activity_uuid_product_uuid
) and will hence lead to no results for the indicator (as there is no sector
, equivalent to no co2_footprint
). This is what I describe under 1. clustered
c has a tilt_sector
(equivalent to matched product with activity_uuid_product_uuid
) and hence will lead to results for the sector
x subsector
x type
x year
combination, even if there will be some NAs (equivalent to an activity_uuid_product_uuid
leading to results for at least some of the 6 benchmarks). So the reprex example is great to discuss the issue. I see two problems with the reprex you shared, one on product-level and one on company-level:
grouped_by
value "weo_a_2050" instead of "NA". We have a tilt_sector
for that clustered and hence should show all benchmarks, even if they are NA. I hope this helps. Let me know if it does!
P.S. I realize my comment here was not very helpful, as it's not wrong but leads to more confusion about the two different cases.
Your https://github.com/2DegreesInvesting/tiltIndicator/pull/738#issuecomment-1975997275 seems to support the idea that these two cases are no different: "But I agree that in this case the unmatched value in sector is the important relationship."
I agree that in this case the unmatched value in sector is the important relationship.
@Tilmon
RE:
product-level: clustered c should have the grouped_by value "weo_a_2050" instead of "NA".
Can you please confirm you expect 1, 2, 3, or something else?
"weo_a_2050"
"weo_NA_2050"
"weo_NA_NA"
To me "1." seems incorrect. The values of grouped_by
have the format <type>_<scenario>_<year>
. And a scenario
and year
that exist for a given type
and year
may not make sense for another type
.
For example, if a clustered
"d" matches the sector
and subsector
for the type
"ipr", and the corresponding scenario
is called "iprScenario2050", then I expect grouped_by == "ipr_iprScenario2050_2050
-- as you say. But if that same sector
and subsector
is unmatched for the type
"weo", then expecting grouped_by == "weo_iprScenario2050_2050"
seems odd.
BTW, you already know this but showing it here for the record: The new expectation conflicts with previous expectations, and changes the structure of the output. This reprex focuses on that structural change (not on the specific values which are still work-in-progreee):
grouped_by
we got NA
; now we get a on-NA
value.reprex
# styler: off
companies <- tibble::tribble(
~sector, ~companies_id, ~clustered, ~activity_uuid_product_uuid, ~subsector, ~tilt_sector, ~tilt_subsector, ~type,
"unmatched", "a", "a", "a", "energy", "a", "a", "ipr"
)
scenarios <- tibble::tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", "2050", "1", "ipr", "a"
)
# styler: on
if (!interactive()) withr::local_options(width = 500)
# BEFORE
# Load code in the main branch
library(tiltIndicator)
packageVersion("tiltIndicator")
#> [1] '0.0.0.9221'
result_main <- sector_profile(companies, scenarios)
result_main |> unnest_product()
#> # A tibble: 1 × 11
#> companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid tilt_sector scenario year type tilt_subsector
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a <NA> <NA> <NA> a a a <NA> <NA> ipr a
result_main |> unnest_company()
#> # A tibble: 1 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a <NA> <NA> NA
# Compare ----------------------------------------------------------------
# NOW
# Load code in this PR
devtools::load_all()
#> ℹ Loading tiltIndicator
packageVersion("tiltIndicator")
#> [1] '0.0.0.9222'
result_pr <- sector_profile(companies, scenarios)
result_pr |> unnest_product()
#> # A tibble: 1 × 11
#> companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid tilt_sector scenario year type tilt_subsector
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a ipr_NA_NA <NA> <NA> a a a <NA> <NA> ipr a
result_pr |> unnest_company()
#> # A tibble: 4 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a ipr_NA_NA high 0
#> 2 a ipr_NA_NA medium 0
#> 3 a ipr_NA_NA low 0
#> 4 a ipr_NA_NA <NA> 1
@Tilmon,
Here I reproduce the example from your GoogleSheet, first case by case then all at once. Please review and try to explain how we can change this output to meet your expectations.
Case by case the outputs seem to make sense but when taken together the output clearly has more values of grouped_by
than the ones you expect. The solution may be to gather consolidate the rows where grouped_by
is unmatched_NA_NA
and weo_NA_NA
. But now I need a sleeping-break so I pass the thinking ball to you :-)
reprex
devtools::load_all()
#> ℹ Loading tiltIndicator
companies <- tribble(
~companies_id, ~clustered, ~activity_uuid_product_uuid, ~tilt_sector, ~tilt_subsector, ~type, ~sector, ~subsector,
"a", "a", "a", "a", "a", "ipr", "total", "energy",
"a", "a", "a", "a", "a", "weo", "total", "energy",
"a", "b", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched",
"a", "c", "unmatched", "c", "c", "ipr", "land use", "land use",
"a", "c", "unmatched", "c", "c", "weo", NA, NA
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", 2050, 1.0, "ipr", "a",
"total", "energy", 2050, 0.6, "weo", "a",
"land use", "land use", 2050, 0.3, "ipr", "a"
)
# CASE BY CASE ---------------------------------------------------------------
case_a <- filter(companies, clustered == "a")
case_a
#> # A tibble: 2 × 8
#> companies_id clustered activity_uuid_produc…¹ tilt_sector tilt_subsector type
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a a a a a ipr
#> 2 a a a a a weo
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: sector <chr>, subsector <chr>
sector_profile(case_a, scenarios) |> unnest_product()
#> # A tibble: 2 × 11
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a ipr_a_2050 high 1 a
#> 2 a weo_a_2050 medium 0.6 a
#> # ℹ 6 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>, tilt_subsector <chr>
sector_profile(case_a, scenarios) |> unnest_company()
#> # A tibble: 8 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a ipr_a_2050 high 1
#> 2 a ipr_a_2050 medium 0
#> 3 a ipr_a_2050 low 0
#> 4 a ipr_a_2050 <NA> 0
#> 5 a weo_a_2050 high 0
#> 6 a weo_a_2050 medium 1
#> 7 a weo_a_2050 low 0
#> 8 a weo_a_2050 <NA> 0
case_b <- filter(companies, clustered == "b")
case_b
#> # A tibble: 1 × 8
#> companies_id clustered activity_uuid_produc…¹ tilt_sector tilt_subsector type
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a b unmatched unmatched unmatched unma…
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: sector <chr>, subsector <chr>
sector_profile(case_b, scenarios) |> unnest_product()
#> # A tibble: 1 × 11
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a unmatched_NA_NA <NA> NA b
#> # ℹ 6 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>, tilt_subsector <chr>
sector_profile(case_b, scenarios) |> unnest_company()
#> # A tibble: 4 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a unmatched_NA_NA high 0
#> 2 a unmatched_NA_NA medium 0
#> 3 a unmatched_NA_NA low 0
#> 4 a unmatched_NA_NA <NA> 1
case_c <- filter(companies, clustered == "c")
case_c
#> # A tibble: 2 × 8
#> companies_id clustered activity_uuid_produc…¹ tilt_sector tilt_subsector type
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a c unmatched c c ipr
#> 2 a c unmatched c c weo
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: sector <chr>, subsector <chr>
sector_profile(case_c, scenarios) |> unnest_product()
#> # A tibble: 2 × 11
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a ipr_a_2050 low 0.3 c
#> 2 a weo_NA_NA <NA> NA c
#> # ℹ 6 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>, tilt_subsector <chr>
sector_profile(case_c, scenarios) |> unnest_company()
#> # A tibble: 8 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a ipr_a_2050 high 0
#> 2 a ipr_a_2050 medium 0
#> 3 a ipr_a_2050 low 1
#> 4 a ipr_a_2050 <NA> 0
#> 5 a weo_NA_NA high 0
#> 6 a weo_NA_NA medium 0
#> 7 a weo_NA_NA low 0
#> 8 a weo_NA_NA <NA> 1
# ALL AT ONCE ----------------------------------------------------------------
sector_profile(companies, scenarios) |> unnest_product()
#> # A tibble: 5 × 11
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a ipr_a_2050 high 1 a
#> 2 a weo_a_2050 medium 0.6 a
#> 3 a unmatched_NA_NA <NA> NA b
#> 4 a ipr_a_2050 low 0.3 c
#> 5 a weo_NA_NA <NA> NA c
#> # ℹ 6 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>, tilt_subsector <chr>
sector_profile(companies, scenarios) |> unnest_company()
#> # A tibble: 16 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a ipr_a_2050 high 0.5
#> 2 a ipr_a_2050 medium 0
#> 3 a ipr_a_2050 low 0.5
#> 4 a ipr_a_2050 <NA> 0
#> 5 a unmatched_NA_NA high 0
#> 6 a unmatched_NA_NA medium 0
#> 7 a unmatched_NA_NA low 0
#> 8 a unmatched_NA_NA <NA> 1
#> 9 a weo_NA_NA high 0
#> 10 a weo_NA_NA medium 0
#> 11 a weo_NA_NA low 0
#> 12 a weo_NA_NA <NA> 1
#> 13 a weo_a_2050 high 0
#> 14 a weo_a_2050 medium 1
#> 15 a weo_a_2050 low 0
#> 16 a weo_a_2050 <NA> 0
Hi @maurolepore, thanks for your thorough explanations and for taking the time to reproduce the example from the Google Sheet case-by-case and all-at-once. That really helps me to understand the struggles in replicating it. And it also shows how incredibly complicated it is. Therefore, I'd like to ask @AnneSchoenauer to also read through these comments carefully to validate my thinking. Maybe I'm also making things too complicated and we need to find a shortcut.
Before I get into the details, one more general questions. I see in the outputs the column profile_ranking
. Does this column show the reductions
? Then can we also call it that way? Or does it show something different I'm unaware of?
Now to your first question in https://github.com/2DegreesInvesting/tiltIndicator/pull/739#issuecomment-2110919379
Can you please confirm you expect 1, 2, 3, or something else?
"weo_a_2050" "weo_NA_2050" "weo_NA_NA"
I would still say, the grouped_by
should be "weo_a_2050" for clustered
c. If we know the type
for a clustered
, then we can also assign a scenario
and year
, no? The scenarios table you are using in https://github.com/2DegreesInvesting/tiltIndicator/pull/739#issuecomment-2111332672, contains the type-scenario-year combination weo_a_2050. Hence, I'd say every clustered
which is assigned to the type
weo, should be analysed with the grouped_by
value that we get from the scenarios dataset for weo, in this case weo_a_2050.
Regarding https://github.com/2DegreesInvesting/tiltIndicator/pull/739#issuecomment-2111332672
Here I reproduce the example from your GoogleSheet, first case by case then all at once. Please review and try to explain how we can change this output to meet your expectations.
Case by case the outputs seem to make sense but when taken together the output clearly has more values of grouped_by than the ones you expect. The solution may be to gather consolidate the rows where grouped_by is unmatched_NA_NA and weo_NA_NA. But now I need a sleeping-break so I pass the thinking ball to you :-)
I indeed think that we somehow need to consolidate the grouped_by
unmatched_NA_NA and weo_NA_NA. While the NA values behind grouped_by
unmatched_NA_NA should be consolidated into both ipr_a_2050 and weo_a_2050, the NA values behind grouped_by
weo_NA_NA should only be consolidated into weo_a_2050. The reason for that is:
clustered b
behind grouped_by
unmatched_NA_NA has no results for either weo or ipr, i.e. on product level, it should have the risk_category NA for both weo_a_2050 and ipr_a_2050 instead. clustered
c behind grouped_by
weo_NA_NA has results for IPR but not for WEO. Hence, on product-level it should only show NA for the grouped_by
weo_a_2050 but for ipr_a_2050, it should show the actual risk_category value (as it does right now). In essence, I believe we should in the end only have the two grouped_by
ipr_a_2050 and weo_a_2050. And every clustered where we have the type
"weo" should be assigned to grouped_by
weo_a_2050, while every clustered where we have the type
"ipr" should be assigned to grouped_by
ipr_a_2050, irrespective of whether it has a sector
or subsector
. If a clustered has no type
at all (or as denoted in the example a type
"unmatched"), it should be countred as NA for all type_scenario_year combinations in the scenarios dataset.
Does that somewhat help?
@maurolepore additional thoughts:
I was wondering how we can put my thoughts from https://github.com/2DegreesInvesting/tiltIndicator/pull/739#issuecomment-2112020721 in a clear business logic. How about the explanation below? Does that help to make things clearer? It's the same thing as above, just expressed in a different order.
The dataset scenarios determines the list of values for grouped_by
that we have. And then every clustered
has to be assigned to all grouped_by
values. Means in the following example...
... that the only two possible grouped_by
values are ipr_a_2050 and weo_a_2050. And now all clustered
need to be attributed to both values. In this example...
clustered
"a" leads to risk_category
low/medium/high in both grouped_by
, because it has sectors for ipr & weo. clustered
"b" leads to risk_category
NA in both grouped_by
, because it has no sector for neither ipr nor weoclustered
"c" leads to risk_category
low/medium/high in for ipr_a_2050 and to NA for weo_a_2050 because it has a sector only for ipr, not for weo.Not sure if this really helps. But worth a try I hope :)
cc' @AnneSchoenauer
Dear @maurolepore I discussed the suggestions I made in the comments above with @AnneSchoenauer and she agrees that that would be the ideal way. If it's not possible, we need to explore alternatives. To put my thoughts from https://github.com/2DegreesInvesting/tiltIndicator/pull/739#issuecomment-2112327196 into concrete examples, I created a new tab in the Google Sheet (_v2) which should now reflect the "business logic". Please note that I colored all cells green that have changed - to make it easier for you to see the difference.
Thanks and please reach out if it's unclear!
Thanks @Tilmon, your comments help.
Here I'll answer this question:
I see in the outputs the column
profile_ranking
. Does this column show the reductions?
Yes, for the sector*()
functions the column profile_ranking
maps to reductions.
Then can we also call it that way?
Maybe in tiltIndicatorAfter, but tiltIndicator would not be a good place for that kind of indicator-specific change. tiltIndicator sits at the core of the system and manipulates business logic at a level that is mostly abstract and general.
The name profile_ranking
may not be perfect for the sector*()
functions specifically, but seemed good in that it uses a general concept from tilt's domain-specific-language that applies to all indicators. Such a general-name can then be used to programatically refer to the columns of all the outputs of all the indicators. This standardization makes code dramatically more maintainable.
For a concrete example, note how simple was the change in the related PR. This change propagates through the function cols_at_product_level()
to many parts of tiltIndicator.
Similarly we use cols_at_all_levels()
which also make the code easier to maintain, for example by automatically updating the name of the columns in the Value section of each indicator's helpfile. Take a moment to note that those column names are NOT mentioned explicitely in the function that generates that documentation: document_default_value()
).
If you still think profile_ranking
is not a good name, we can open a new issue and refer to this comment. But before we go through that trouble, consider that tiltIndicator does not face users. The output of tiltIndicator is consumed by tiltIndicatorAfter, and there is where you need to worry about the user-facing names.
You can visualize the aspirational architecture of our system using The Clean Architecture model. tiltIndicator aims to host the enterprise (yellow) and application (red) business rules, and tiltIndicatorAfter aims to host the "interface adaptors" (green).
profile_ranking
@Tilmon
Here's today's update.
The code now yields exactly what you expect in your example googlesheet v1.
reprex
devtools::load_all()
#> ℹ Loading tiltIndicator
companies <- tribble(
~companies_id, ~clustered, ~activity_uuid_product_uuid, ~tilt_sector, ~tilt_subsector, ~type, ~sector, ~subsector,
"a", "a", "a", "a", "a", "ipr", "total", "energy",
"a", "a", "a", "a", "a", "weo", "total", "energy",
"a", "b", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched",
"a", "c", "unmatched", "c", "c", "ipr", "land use", "land use",
"a", "c", "unmatched", "c", "c", "weo", NA, NA
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", 2050, 1.0, "ipr", "a",
"total", "energy", 2050, 0.6, "weo", "a",
"land use", "land use", 2050, 0.3, "ipr", "a"
)
# ALL AT ONCE ----------------------------------------------------------------
sector_profile(companies, scenarios) |> unnest_product() |> arrange(clustered)
#> # A tibble: 5 × 11
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a ipr_a_2050 high 1 a
#> 2 a weo_a_2050 medium 0.6 a
#> 3 a <NA> <NA> NA b
#> 4 a ipr_a_2050 low 0.3 c
#> 5 a weo_a_2050 <NA> NA c
#> # ℹ 6 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>, tilt_subsector <chr>
sector_profile(companies, scenarios) |> unnest_company()
#> # A tibble: 8 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a ipr_a_2050 high 0.333
#> 2 a ipr_a_2050 medium 0
#> 3 a ipr_a_2050 low 0.333
#> 4 a ipr_a_2050 <NA> 0.333
#> 5 a weo_a_2050 high 0
#> 6 a weo_a_2050 medium 0.333
#> 7 a weo_a_2050 low 0
#> 8 a weo_a_2050 <NA> 0.667
Although we're closer, this is not yet the end.
The same example doesn't work smoothly when each case is considered separately.
reprex
devtools::load_all()
#> ℹ Loading tiltIndicator
companies <- tribble(
~companies_id, ~clustered, ~activity_uuid_product_uuid, ~tilt_sector, ~tilt_subsector, ~type, ~sector, ~subsector,
"a", "a", "a", "a", "a", "ipr", "total", "energy",
"a", "a", "a", "a", "a", "weo", "total", "energy",
"a", "b", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched",
"a", "c", "unmatched", "c", "c", "ipr", "land use", "land use",
"a", "c", "unmatched", "c", "c", "weo", NA, NA
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", 2050, 1.0, "ipr", "a",
"total", "energy", 2050, 0.6, "weo", "a",
"land use", "land use", 2050, 0.3, "ipr", "a"
)
# CASE BY CASE ---------------------------------------------------------------
case_a <- filter(companies, clustered == "a")
case_a
#> # A tibble: 2 × 8
#> companies_id clustered activity_uuid_produc…¹ tilt_sector tilt_subsector type
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a a a a a ipr
#> 2 a a a a a weo
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: sector <chr>, subsector <chr>
sector_profile(case_a, scenarios) |> unnest_product()
#> # A tibble: 2 × 11
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a ipr_a_2050 high 1 a
#> 2 a weo_a_2050 medium 0.6 a
#> # ℹ 6 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>, tilt_subsector <chr>
sector_profile(case_a, scenarios) |> unnest_company()
#> # A tibble: 8 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a ipr_a_2050 high 1
#> 2 a ipr_a_2050 medium 0
#> 3 a ipr_a_2050 low 0
#> 4 a ipr_a_2050 <NA> 0
#> 5 a weo_a_2050 high 0
#> 6 a weo_a_2050 medium 1
#> 7 a weo_a_2050 low 0
#> 8 a weo_a_2050 <NA> 0
case_b <- filter(companies, clustered == "b")
case_b
#> # A tibble: 1 × 8
#> companies_id clustered activity_uuid_produc…¹ tilt_sector tilt_subsector type
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a b unmatched unmatched unmatched unma…
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: sector <chr>, subsector <chr>
sector_profile(case_b, scenarios) |> unnest_product()
#> Error in `dplyr_col_modify()`:
#> ! Can't recycle `grouped_by` (size 2) to size 0.
sector_profile(case_b, scenarios) |> unnest_company()
#> Error in `dplyr_col_modify()`:
#> ! Can't recycle `grouped_by` (size 2) to size 0.
case_c <- filter(companies, clustered == "c")
case_c
#> # A tibble: 2 × 8
#> companies_id clustered activity_uuid_produc…¹ tilt_sector tilt_subsector type
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a c unmatched c c ipr
#> 2 a c unmatched c c weo
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
#> # ℹ 2 more variables: sector <chr>, subsector <chr>
sector_profile(case_c, scenarios) |> unnest_product()
#> # A tibble: 2 × 11
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a ipr_a_2050 low 0.3 c
#> 2 a weo_a_2050 <NA> NA c
#> # ℹ 6 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>, tilt_subsector <chr>
sector_profile(case_c, scenarios) |> unnest_company()
#> # A tibble: 8 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 a ipr_a_2050 high 0
#> 2 a ipr_a_2050 medium 0
#> 3 a ipr_a_2050 low 1
#> 4 a ipr_a_2050 <NA> 0
#> 5 a weo_a_2050 high 0
#> 6 a weo_a_2050 medium 0
#> 7 a weo_a_2050 low 0
#> 8 a weo_a_2050 <NA> 1
Also multiple previous tests fail.
reprex
devtools::test_active_file("R/sector_profile.R")
#>
#> [ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
#> [ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
#> [ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ]
#> [ FAIL 0 | WARN 0 | SKIP 0 | PASS 3 ]
#> [ FAIL 0 | WARN 0 | SKIP 0 | PASS 4 ]
#> [ FAIL 1 | WARN 0 | SKIP 0 | PASS 4 ]
#> [ FAIL 2 | WARN 0 | SKIP 0 | PASS 4 ]
#> [ FAIL 3 | WARN 0 | SKIP 0 | PASS 4 ]
#> [ FAIL 4 | WARN 0 | SKIP 0 | PASS 4 ]
#> [ FAIL 4 | WARN 0 | SKIP 0 | PASS 5 ]
#> [ FAIL 4 | WARN 0 | SKIP 0 | PASS 6 ]
#> [ FAIL 4 | WARN 0 | SKIP 0 | PASS 7 ]
#> [ FAIL 5 | WARN 0 | SKIP 0 | PASS 7 ]
#> [ FAIL 6 | WARN 0 | SKIP 0 | PASS 7 ]
#> [ FAIL 7 | WARN 0 | SKIP 0 | PASS 7 ]
#> [ FAIL 8 | WARN 0 | SKIP 0 | PASS 7 ]
#> [ FAIL 8 | WARN 0 | SKIP 0 | PASS 8 ]
#> [ FAIL 8 | WARN 0 | SKIP 0 | PASS 9 ]
#> [ FAIL 8 | WARN 0 | SKIP 0 | PASS 10 ]
#> [ FAIL 8 | WARN 0 | SKIP 0 | PASS 11 ]
#>
#> ── Failure ('test-sector_profile.R:44:3'): at product level, preserves unmatched products ──
#> "unmatched" %in% out[[aka("uid")]] is not TRUE
#>
#> `actual`: FALSE
#> `expected`: TRUE
#>
#> ── Failure ('test-sector_profile.R:58:3'): at product level, unmatched product yield `NA` in the expected columns ──
#> is.na(out$grouped_by) is not TRUE
#>
#> `actual`:
#> `expected`: TRUE
#>
#> ── Failure ('test-sector_profile.R:59:3'): at product level, unmatched product yield `NA` in the expected columns ──
#> is.na(out$risk_category) is not TRUE
#>
#> `actual`:
#> `expected`: TRUE
#>
#> ── Failure ('test-sector_profile.R:60:3'): at product level, unmatched product yield `NA` in the expected columns ──
#> is.na(out$profile_ranking) is not TRUE
#>
#> `actual`:
#> `expected`: TRUE
#>
#> ── Failure ('test-sector_profile.R:108:3'): at company level, one matched and one unmatched products yield `value = 1/2` where `risk_category = NA` and in one other `risk_category` (#657) ──
#> `na` (`actual`) not equal to 1/2 (`expected`).
#>
#> `actual`: 0.0
#> `expected`: 0.5
#>
#> ── Failure ('test-sector_profile.R:110:3'): at company level, one matched and one unmatched products yield `value = 1/2` where `risk_category = NA` and in one other `risk_category` (#657) ──
#> sort(other) (`actual`) not equal to c(0, 0, 1/2) (`expected`).
#>
#> `actual`: 0.0 0.0 1.0
#> `expected`: 0.0 0.0 0.5
#>
#> ── Failure ('test-sector_profile.R:126:3'): at company level, two matched and one unmatched products yield `value = 1/3` where `risk_category = NA` and `value = 2/3` in one other `risk_category` (#657) ──
#> `na` (`actual`) not equal to 1/3 (`expected`).
#>
#> `actual`: 0.0
#> `expected`: 0.3
#>
#> ── Failure ('test-sector_profile.R:128:3'): at company level, two matched and one unmatched products yield `value = 1/3` where `risk_category = NA` and `value = 2/3` in one other `risk_category` (#657) ──
#> sort(other) (`actual`) not equal to c(0, 0, 2/3) (`expected`).
#>
#> `actual`: 0.0 0.0 1.0
#> `expected`: 0.0 0.0 0.7
#>
#> [ FAIL 8 | WARN 0 | SKIP 0 | PASS 11 ]
So I need to investigate each problematic result. In some cases the solution may be obvious, and the new requirement will lead me to adapt or remove a previous test. In other cases the conflict between the old and new requirements may not be obvious and we'll need to discuss further to decide what the code should actually do.
I'll come back tomorrow with a fresh brain.
@maurolepore that's indeed good news! Awesome.
I hope the Bad News won't cause too much headache to solve. Good luck!
@Tilmon,
sector_profile()
now works as you expect. I fixed all tests and added a few more.
sector_profile_upstream()
didn't get any update yet. Can you please help me by modifying this draft example so that I have something to test against?
It's bases on your googlesheet example for sector_profile()
but it's a bit more complex since we need an additional inputs
dataset.
reprex
devtools::load_all()
#> ℹ Loading tiltIndicator
# styler: off
companies <- tribble(
~companies_id, ~clustered, ~activity_uuid_product_uuid, ~tilt_sector, ~tilt_subsector,
"a", "a", "a", "a", "a",
"a", "a", "a", "a", "a",
"a", "b", "unmatched", "unmatched", "unmatched",
"a", "c", "unmatched", "c", "c",
"a", "c", "unmatched", "c", "c"
)
inputs <- tribble(
~activity_uuid_product_uuid, ~input_tilt_sector, ~input_tilt_subsector, ~type, ~sector, ~subsector, ~input_activity_uuid_product_uuid,
"a", "a", "a", "ipr", "total", "energy", "a",
"a", "a", "a", "weo", "total", "energy", "a",
"unmatched", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched", "unmatched",
"unmatched", "c", "c", "ipr", "land use", "land use", "unmatched",
"unmatched", "c", "c", "weo", NA, NA, "unmatched"
)
scenarios <- tribble(
~sector, ~subsector, ~year, ~reductions, ~type, ~scenario,
"total", "energy", 2050, 1.0, "ipr", "a",
"total", "energy", 2050, 0.6, "weo", "a",
"land use", "land use", 2050, 0.3, "ipr", "a"
)
# styler: on
sector_profile_upstream(companies, scenarios, inputs) |> unnest_product()
#> # A tibble: 8 × 13
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a ipr_a_2050 high 1 a
#> 2 a weo_a_2050 medium 0.6 a
#> 3 a <NA> <NA> NA b
#> 4 a ipr_a_2050 low 0.3 b
#> 5 a <NA> <NA> NA b
#> 6 a <NA> <NA> NA c
#> 7 a ipr_a_2050 low 0.3 c
#> 8 a <NA> <NA> NA c
#> # ℹ 8 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>,
#> # input_activity_uuid_product_uuid <chr>, input_tilt_sector <chr>,
#> # input_tilt_subsector <chr>
sector_profile_upstream(companies, scenarios, inputs) |> unnest_product()
#> # A tibble: 8 × 13
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a ipr_a_2050 high 1 a
#> 2 a weo_a_2050 medium 0.6 a
#> 3 a <NA> <NA> NA b
#> 4 a ipr_a_2050 low 0.3 b
#> 5 a <NA> <NA> NA b
#> 6 a <NA> <NA> NA c
#> 7 a ipr_a_2050 low 0.3 c
#> 8 a <NA> <NA> NA c
#> # ℹ 8 more variables: activity_uuid_product_uuid <chr>, tilt_sector <chr>,
#> # scenario <chr>, year <dbl>, type <chr>,
#> # input_activity_uuid_product_uuid <chr>, input_tilt_sector <chr>,
#> # input_tilt_subsector <chr>
Hi @maurolepore , awesome that the sector_profile()
now shows the results as expected!! Congrats!
Re sector_upstream_profile()
: @AnneSchoenauer and I discussed priorities and decided that given our ambitious target to launch the webtool in June, it's best to pause the sector_upstream_profile()
work for now. It's anyways a bit tricky to publish the licensed input data in the webtool. Therefore, we suggest to launch the webtool at least in the first version without the upstream results.
Would therefore be great if you could instead soon start working on the webtool!
it's best to pause the sector_upstream_profile() work for now -- @Tilmon in https://github.com/2DegreesInvesting/tiltIndicator/pull/739#issuecomment-2117556054
Noted (https://github.com/2DegreesInvesting/tiltIndicator/issues/784).
Thanks!
reprex of the approved behaviour
--
Given a
clustered
matching one but not a secondtype
of scenario, when thescenarios
dataset has the two types, then the secondtype
and its correspondingscenario
are still present ingrouped_by
, and the mismatch is reflected correctly in thevalue
.The expected behaviour is captured in this GoogleSheet and explained in https://github.com/2DegreesInvesting/tiltIndicator/pull/739#issuecomment-1977426095 (thanks @Tilmon).
TODO
EXCEPTIONS