2DegreesInvesting / tiltIndicator

Implement the core business logic of the tilt indicators
https://2degreesinvesting.github.io/tiltIndicator/
GNU General Public License v3.0
1 stars 1 forks source link

`profile_ranking` doesn't distribute the `activity_uuid_product_uuid`s in three equal parts by using thresholds `1/3` (low) and `2/3` (high) for emission profile indicator #789

Closed kalashsinghal closed 4 months ago

kalashsinghal commented 4 months ago

Dear @Tilmon @AnneSchoenauer

I have found an issue after investigating unequal distribution of activity_uuid_product_uuids in three equal parts by the low and high thresholds (1/3 and 2/3).

We calculate the profile ranking for emission profile indicator by ranking the co2_footprint values after grouping them with six benchmarks (all, tilt_sector, isic_4digit, unit, unit_tilt_sector, unit_isic_4digit) using this formula:

rank_proportion <- function(co2_footprint) {
  dense_rank(co2_footprint) / length(unique(co2_footprint))
}

As you can see from the above function, we are first ranking the co2 values and then dividing each value by the number of distinct/unique values in the co2_footprint column (for that specific group). Please ignore the fact the the grouping function does not exist in above function!

Above methodology will not divide the activity_uuid_product_uuids in three equal parts by thresholds 1/3 and 2/3 because we are dividing the ranks using unique co2 values and not the activity_uuid_product_uuids. There are many cases where we will have same co2 values for different activity_uuid_product_uuids. Due to this reason, the numerical gap between each rank will be more than it should be. This gap between each rank is decided by the denominator of the ranked value and this value needs to be correct if also want to divide the activity_uuid_product_uuids in three equal parts using the thresholds 1/3 and 2/3. Please have a look the below reprex for better understanding:

library(readr)
library(dplyr)
devtools::load_all(".")
#> ℹ Loading tiltIndicator
options(width = 500)

example <- tibble(
  activity_uuid_product_uuid = c("uuid1", "uuid2", "uuid3", "uuid4", "uuid5", "uuid6", "uuid7", "uuid8", "uuid9", "uuid10"),
  co2_footprint = c(1, 2, 3, 3, 3, 4, 4, 5, 6, 7),
  isic_4digit = c("'3420'", "'3420'", "'3420'", "'3420'", "'3420'", "'3420'", "'3420'", "'3420'", "'3420'", "'3420'"),
  tilt_sector = c("sec", "sec", "sec", "sec", "sec", "sec", "sec", "sec", "sec", "sec"),
  unit = c("kg", "kg", "kg", "kg", "kg", "kg", "kg", "kg", "kg", "kg"),
)

example_output <- epa_compute_profile_ranking(example)
example_output |> 
  print(n = Inf)
#> # A tibble: 60 × 7
#>    grouped_by       profile_ranking activity_uuid_product_uuid co2_footprint isic_4digit tilt_sector unit 
#>    <chr>                      <dbl> <chr>                              <dbl> <chr>       <chr>       <chr>
#>  1 all                        0.143 uuid1                                  1 '3420'      sec         kg   
#>  2 all                        0.286 uuid2                                  2 '3420'      sec         kg   
#>  3 all                        0.429 uuid3                                  3 '3420'      sec         kg   
#>  4 all                        0.429 uuid4                                  3 '3420'      sec         kg   
#>  5 all                        0.429 uuid5                                  3 '3420'      sec         kg   
#>  6 all                        0.571 uuid6                                  4 '3420'      sec         kg   
#>  7 all                        0.571 uuid7                                  4 '3420'      sec         kg   
#>  8 all                        0.714 uuid8                                  5 '3420'      sec         kg   
#>  9 all                        0.857 uuid9                                  6 '3420'      sec         kg   
#> 10 all                        1     uuid10                                 7 '3420'      sec         kg   
#> 11 isic_4digit                0.143 uuid1                                  1 '3420'      sec         kg   
#> 12 isic_4digit                0.286 uuid2                                  2 '3420'      sec         kg   
#> 13 isic_4digit                0.429 uuid3                                  3 '3420'      sec         kg   
#> 14 isic_4digit                0.429 uuid4                                  3 '3420'      sec         kg   
#> 15 isic_4digit                0.429 uuid5                                  3 '3420'      sec         kg   
#> 16 isic_4digit                0.571 uuid6                                  4 '3420'      sec         kg   
#> 17 isic_4digit                0.571 uuid7                                  4 '3420'      sec         kg   
#> 18 isic_4digit                0.714 uuid8                                  5 '3420'      sec         kg   
#> 19 isic_4digit                0.857 uuid9                                  6 '3420'      sec         kg   
#> 20 isic_4digit                1     uuid10                                 7 '3420'      sec         kg   
#> 21 tilt_sector                0.143 uuid1                                  1 '3420'      sec         kg   
#> 22 tilt_sector                0.286 uuid2                                  2 '3420'      sec         kg   
#> 23 tilt_sector                0.429 uuid3                                  3 '3420'      sec         kg   
#> 24 tilt_sector                0.429 uuid4                                  3 '3420'      sec         kg   
#> 25 tilt_sector                0.429 uuid5                                  3 '3420'      sec         kg   
#> 26 tilt_sector                0.571 uuid6                                  4 '3420'      sec         kg   
#> 27 tilt_sector                0.571 uuid7                                  4 '3420'      sec         kg   
#> 28 tilt_sector                0.714 uuid8                                  5 '3420'      sec         kg   
#> 29 tilt_sector                0.857 uuid9                                  6 '3420'      sec         kg   
#> 30 tilt_sector                1     uuid10                                 7 '3420'      sec         kg   
#> 31 unit                       0.143 uuid1                                  1 '3420'      sec         kg   
#> 32 unit                       0.286 uuid2                                  2 '3420'      sec         kg   
#> 33 unit                       0.429 uuid3                                  3 '3420'      sec         kg   
#> 34 unit                       0.429 uuid4                                  3 '3420'      sec         kg   
#> 35 unit                       0.429 uuid5                                  3 '3420'      sec         kg   
#> 36 unit                       0.571 uuid6                                  4 '3420'      sec         kg   
#> 37 unit                       0.571 uuid7                                  4 '3420'      sec         kg   
#> 38 unit                       0.714 uuid8                                  5 '3420'      sec         kg   
#> 39 unit                       0.857 uuid9                                  6 '3420'      sec         kg   
#> 40 unit                       1     uuid10                                 7 '3420'      sec         kg   
#> 41 unit_isic_4digit           0.143 uuid1                                  1 '3420'      sec         kg   
#> 42 unit_isic_4digit           0.286 uuid2                                  2 '3420'      sec         kg   
#> 43 unit_isic_4digit           0.429 uuid3                                  3 '3420'      sec         kg   
#> 44 unit_isic_4digit           0.429 uuid4                                  3 '3420'      sec         kg   
#> 45 unit_isic_4digit           0.429 uuid5                                  3 '3420'      sec         kg   
#> 46 unit_isic_4digit           0.571 uuid6                                  4 '3420'      sec         kg   
#> 47 unit_isic_4digit           0.571 uuid7                                  4 '3420'      sec         kg   
#> 48 unit_isic_4digit           0.714 uuid8                                  5 '3420'      sec         kg   
#> 49 unit_isic_4digit           0.857 uuid9                                  6 '3420'      sec         kg   
#> 50 unit_isic_4digit           1     uuid10                                 7 '3420'      sec         kg   
#> 51 unit_tilt_sector           0.143 uuid1                                  1 '3420'      sec         kg   
#> 52 unit_tilt_sector           0.286 uuid2                                  2 '3420'      sec         kg   
#> 53 unit_tilt_sector           0.429 uuid3                                  3 '3420'      sec         kg   
#> 54 unit_tilt_sector           0.429 uuid4                                  3 '3420'      sec         kg   
#> 55 unit_tilt_sector           0.429 uuid5                                  3 '3420'      sec         kg   
#> 56 unit_tilt_sector           0.571 uuid6                                  4 '3420'      sec         kg   
#> 57 unit_tilt_sector           0.571 uuid7                                  4 '3420'      sec         kg   
#> 58 unit_tilt_sector           0.714 uuid8                                  5 '3420'      sec         kg   
#> 59 unit_tilt_sector           0.857 uuid9                                  6 '3420'      sec         kg   
#> 60 unit_tilt_sector           1     uuid10                                 7 '3420'      sec         kg

Created on 2024-05-29 with reprex v2.0.2

As you can see from the above reprex, the co2 value 3 has rank 0.429 which is above the 1/3 threshold which gives us only two activity_uuid_product_uuids below 1/3 threshold. This issue exist because of different unique values of activity_uuid_product_uuid and co2_footprint for a specific group.

Please let me know of any questions as its not easy to understand! :)

cc @maurolepore

Tilmon commented 4 months ago

@kalashsinghal , thanks for investigating! Then that's all good and we can move forward :)

kalashsinghal commented 4 months ago

I am closing this ticket as no change is required :)