Closed AnneSchoenauer closed 8 months ago
@AnneSchoenauer, so this quantile-based score is different than the rank-based score behind "low", "medium", and "high"?
FYI, currently we first take the co2_footprint
and apply rank_proportion()
to get a score:
rank_proportion <- function(x) {
rank(x) / length(x)
}
Then we take that score and categorize it with categorize_risk()
to get the risk categories "low", "medium", and "high".
categorize_risk <- function(x, low_threshold, high_threshold, ...) {
case_when(
x > high_threshold ~ "high",
x > low_threshold & x <= high_threshold ~ "medium",
x <= low_threshold ~ "low",
...
)
}
@maurolepore, thanks a lot for this.
FYI, currently we first take the co2_footprint and apply rank_proportion() to get a score:
Could you tell me how this score looks like for some data?
hen we take that score and categorize it with categorize_risk() to get the risk categories "low", "medium", and "high".
Can you tell me how the thresholds; high_threshold, and low_threshold is defined please?
Thanks a lot!
Here is an internal intermediate dataset that may help. Note the column values_to_categorize
is the rank-based score that we use to create risk_category
.
# A tibble: 10 × 12
grouped_by co2_footprint values_to_categorize low_threshold high_threshold risk_category tilt_sec tilt_subsector unit isic_sec activity_uuid_product_uuid ei_activity_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 all 176. 1 0.333 0.667 high Industry Other unit 2560 0a242b09-772a-5edf-8e82-9cb4ba52a258_ae39ee61-d4d0-4cce-93b4-0745344da5fa cookstove production or electric
2 all 58.1 0.8 0.333 0.667 high Industry Other unit 2560 be06d25c-73dc-55fb-965b-0f300453e380_98b48ff2-2200-4b08-9dec-9c7c0e3585bc microwave oven production
3 all 4.95 0.4 0.333 0.667 medium Steel & Metals Steel kg 2870 977d997e-c257-5033-ba39-d0edeeef4ba2_0ace02fa-eca5-482d-a829-c18e46a52db4 market for steel, chromium steel
4 all 12.5 0.6 0.333 0.667 medium Agriculture Agriculture kg 1780 ebb8475e-ff57-5e4e-937b-b5788186a5ca_ccee034c-8b6c-40d6-ac36-4c70c4623efa cheese production, soft, from cow milk
5 all 2.07 0.2 0.333 0.667 low Industry Other kg 2679 2f7b77a7-1556-5c1b-b0aa-c4534ddc8885_38d493e9-6feb-4c66-86eb-2253ef8ee54d market for seal, natural rubber based
6 isic_sec 176. 1 0.333 0.667 high Industry Other unit 2560 0a242b09-772a-5edf-8e82-9cb4ba52a258_ae39ee61-d4d0-4cce-93b4-0745344da5fa cookstove production or electric
7 isic_sec 58.1 0.5 0.333 0.667 medium Industry Other unit 2560 be06d25c-73dc-55fb-965b-0f300453e380_98b48ff2-2200-4b08-9dec-9c7c0e3585bc microwave oven production
8 isic_sec 4.95 1 0.333 0.667 high Steel & Metals Steel kg 2870 977d997e-c257-5033-ba39-d0edeeef4ba2_0ace02fa-eca5-482d-a829-c18e46a52db4 market for steel, chromium steel
9 isic_sec 12.5 1 0.333 0.667 high Agriculture Agriculture kg 1780 ebb8475e-ff57-5e4e-937b-b5788186a5ca_ccee034c-8b6c-40d6-ac36-4c70c4623efa cheese production, soft, from cow milk
10 isic_sec 2.07 1 0.333 0.667 high Industry Other kg 2679 2f7b77a7-1556-5c1b-b0aa-c4534ddc8885_38d493e9-6feb-4c66-86eb-2253ef8ee54d market for seal, natural rubber based
For the record, I got it by running the example of emissions_profile()
and
stopping execution at line 11 of the internal function emissions_profile_any_at_product_level()
https://github.com/2DegreesInvesting/tiltIndicator/blob/2192074cb2264e905510f351c0c4092f07def7ae/R/emissions_profile_any_at_product_level.R#L11
emissions_profile_any_at_product_level <- function(companies,
co2,
low_threshold = 1 / 3,
high_threshold = 2 / 3) {
co2 <- sanitize_co2(co2)
x <- list(companies = companies, co2 = co2)
epa_check(x)
.companies <- prepare_companies(companies)
.co2 <- prepare_co2(co2, low_threshold, high_threshold)
.co2 |>
epa_add_values_to_categorize() |>
add_risk_category(low_threshold, high_threshold) |>
join_companies(.companies) |>
epa_select_cols_at_product_level() |>
polish_output(cols_at_product_level())
}
Dear @maurolepore, Thanks a lot for following up on this! I think this is exactly what we would need. A final question to be one 100% sure. The length(x)
are the number of all products that we use to do the benchmarking right? And the rank(x)
is at which place one products' carbon footprint stand compared to all other products right? That means if we have for example 120 products with a carbon footprint that length(x) = 120 and if we now have one products whose carbon footrpint is the 4th lowest one that the values to categorise
would be 4/120 right? So the values to categorise
would be 0.03. And this is lower than the lowest threshold
which is 0.333 and therefore it is categorised as low. Is this correct?
The length(x) are the number of all products that we use to do the benchmarking right? And the rank(x) is at which place one products' carbon footprint stand compared to all other products right?
I just explored the code and I see that we do every calculation withing the groups defined by each benchmark. The most comprehensive one is "all" -- which considers all rows in the dataset (but not all the products you wish: #566).
I'll Slack you a link to a video where I show this interactively in RStudio.
--
At the conceptual level I think the best person to ask if Tilman. I recall he could articulate this calculations clearly and from the top of his head. So if there is a mismatch between what you think should happen and what it actually happens, you may want to discuss with him. If the change is time-consuming it would be a waste to do it one way now then undo it later.
Thanks @maurolepore I will talk to Tilman today then but I am actually really sure that then unfortunately @tilman was doing a mistake here.... But thanks for letting me know. I double check with him and let you know how to continue.
Dear @maurolepore I talked to Tilman. However I will follow up in this ticket here: https://github.com/2DegreesInvesting/tiltIndicator/issues/566 as this ticket here is a slighlty different problem. For me it is now clear that we can have an "exact percentile rank" which is the variable "values_to_categorise". So this is great. Let's leave this issue here aside and fix first the benchmark issue#566
I think this issue would be something that refers to the output files. As we now calculate the "values_to_categorise" - now called "profile_ranking" in the tiltIndicatorBefore package we just need to ensure that this information is not lost and is part of the output files in the end. @kalashsinghal and @maurolepore. Who would be responsible for it?
I thinks this belongs to tiltIndicator. Once tiltIndicatorBefore computes profile_ranking
using the entire ecoinvent
dataset (#566), that column still needs to be exposed in tiltIndicator. So I'll leave it here and assigned to me.
Okay agree!
Relates to https://github.com/2DegreesInvesting/tiltIndicator/issues/566 Relates to https://github.com/2DegreesInvesting/tiltIndicator/issues/549
--
We need to calculate the transition risk score. For this what would be needed are two things - we need the SERT for the sector profiles (already planned for the enhancement of the tilt indicator) and the rank of the company - so in which percentile the company is located.
This ticket should be solved when this ticket here is completed.
Issue Description:
Background: We have a dataset that lists products and their associated carbon footprints. We've previously categorized these products based on their carbon footprints as "low", "medium", or "high". These categories correspond to the bottom 33%, the middle 33-66%, and the top 66+%, respectively.
Task: We need to further refine this data by adding an exact percentile rank for each product's carbon footprint. This will allow us to know if a product is, for example, in the 20th percentile or the 80th percentile, etc.
Acceptance Criteria:
Steps:
Notes:
Be mindful of potential ties in the dataset. If two products have the same carbon footprint, they should have the same rank. Ensure that the dataset maintains its original order after the ranking is done. Please also note that the rank is different based on which benchmark we use. Therefore, we would need those ranks for each six benchmarks. In other words the ranks depend on which benchmark (e.g. tilt_sec, tilt_sec_unit, tilt_isic...) we have.