Closed AnneSchoenauer closed 5 months ago
Thanks for your patience @AnneSchoenauer,
I believe we all understand the requirement but it's great to check here it's actually implemented as you expect.
Note we're mixing the behavior of the code (tiltIndicator) with the characteristics of the toy datasets (tiltToyData). I'll discuss each separately.
For the record the PR that implemented your request is in tiltIndicator tiltIndicator#644.
This search gives you an overview of what the code currently does (likely it's the code you said Kalash showed you):
You can see that we:
co2
dataset (presumably with all ecoinvent products), thenprofile_ranking
(using all ecoinvent products), thenrisk_category
(until now we still have all ecoinvent products), thencompanies
(see R/utils.R note left_join()
with companies
on the left side), thenFor the record, Kalash shared with me some small real datasets that presumably have these properties. Finally I turned turn those small real datasets into toy datasets we can share publicly. The result is this PR: #19.
Thanks for your reprex, particularly for the meaningful variable names :-)
Here is my version, using the toy datasets we are developing in #19 (reproduced from here):
library(readr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
devtools::load_all()
#> ℹ Loading tiltToyData
options(readr.show_col_types = FALSE, width = 1000)
companies <- read_csv(toy_emissions_profile_any_companies())
products <- read_csv(toy_emissions_profile_products_ecoinvent())
# *uuid in companies that match *uuid in products
left_join(companies, products, relationship = "many-to-many") |>
print() |>
distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name, unit)`
#> # A tibble: 155 × 12
#> activity_uuid_product_uuid clustered companies_id country ei_activity_name main_activity unit co2_footprint ei_geography isic_4digit tilt_sector tilt_subsector
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7 tent soot_asianpiedstarling germany market for shed, large, wood, non-insulated, fire-unprotected distributor m2 447. GLO '4100' construction construction residential
#> 2 76269c17-78d6-420b-991a-aa38c51b45b7 tent soot_asianpiedstarling germany market for shed, large, wood, non-insulated, fire-unprotected distributor m2 321. GLO '4100' construction construction residential
#> 3 76269c17-78d6-420b-991a-aa38c51b45b7 table hire for parties frightening_chrysomelid spain market for shed, large, wood, non-insulated, fire-unprotected wholesaler m2 447. GLO '4100' construction construction residential
#> 4 76269c17-78d6-420b-991a-aa38c51b45b7 table hire for parties frightening_chrysomelid spain market for shed, large, wood, non-insulated, fire-unprotected wholesaler m2 321. GLO '4100' construction construction residential
#> 5 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg 0.405 GLO '2591' metals other metals
#> 6 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg 0.884 GLO '2591' metals other metals
#> 7 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg 0.675 GLO '2591' metals other metals
#> 8 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg 0.405 GLO '2591' metals other metals
#> 9 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg 0.884 GLO '2591' metals other metals
#> 10 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg 0.675 GLO '2591' metals other metals
#> # ℹ 145 more rows
#> # A tibble: 3 × 1
#> activity_uuid_product_uuid
#> <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb
#> 3 833caa78-30df-4374-900f-7f88ab44075b
# *uuid in companies that do NOT match *uuid in products
anti_join(companies, products) |>
print() |>
distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name, unit)`
#> # A tibble: 4 × 7
#> activity_uuid_product_uuid clustered companies_id country ei_activity_name main_activity unit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 833caa78-30df-4374-900f-7f88ab44075b garden fittings weak_meadowlark netherlands market for shed, large, wood, non-insulated, fire-unprotected wholesaler m2
#> 2 833caa78-30df-4374-900f-7f88ab44075b garden fittings arrogant_ewe netherlands market for shed, large, wood, non-insulated, fire-unprotected wholesaler m2
#> 3 833caa78-30df-4374-900f-7f88ab44075b tent pseudoeconomical_easternglasslizard germany market for shed, large, wood, non-insulated, fire-unprotected distributor m2
#> 4 833caa78-30df-4374-900f-7f88ab44075b tent charterable_wren germany market for shed, large, wood, non-insulated, fire-unprotected distributor m2
#> # A tibble: 1 × 1
#> activity_uuid_product_uuid
#> <chr>
#> 1 833caa78-30df-4374-900f-7f88ab44075b
# *uuid in products that do NOT match *uuid in companies
anti_join(products, companies) |>
print() |>
distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name, unit)`
#> # A tibble: 8 × 8
#> activity_uuid_product_uuid co2_footprint ei_activity_name ei_geography isic_4digit tilt_sector tilt_subsector unit
#> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 833caa78-30df-4374-900f-7f88ab44075b 14.1 iron-nickel-chromium alloy production RER '2410' metals iron & steel kg
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb 0.419 market for deep drawing, steel, 10000 kN press, automode GLO '2591' metals other metals kg
#> 3 833caa78-30df-4374-900f-7f88ab44075b 9.47 iron-nickel-chromium alloy production RER '2410' metals iron & steel kg
#> 4 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb 0.648 market for deep drawing, steel, 10000 kN press, automode GLO '2591' metals other metals kg
#> 5 833caa78-30df-4374-900f-7f88ab44075b 13.6 iron-nickel-chromium alloy production RER '2410' metals iron & steel kg
#> 6 833caa78-30df-4374-900f-7f88ab44075b 14.7 iron-nickel-chromium alloy production RER '2410' metals iron & steel kg
#> 7 833caa78-30df-4374-900f-7f88ab44075b 0.390 market for deep drawing, steel, 10000 kN press, automode GLO '2591' metals other metals kg
#> 8 833caa78-30df-4374-900f-7f88ab44075b 12.7 iron-nickel-chromium alloy production RER '2410' metals iron & steel kg
#> # A tibble: 2 × 1
#> activity_uuid_product_uuid
#> <chr>
#> 1 833caa78-30df-4374-900f-7f88ab44075b
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb
companies <- read_csv(toy_emissions_profile_any_companies())
inputs <- read_csv(toy_emissions_profile_upstream_products_ecoinvent())
# *uuid in companies that match *uuid in inputs
left_join(companies, inputs, relationship = "many-to-many") |>
print() |>
distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name)`
#> # A tibble: 97 × 16
#> activity_uuid_product_uuid clustered companies_id country ei_activity_name main_activity unit ei_geography input_activity_uuid_product_uuid input_co2_footprint input_ei_activity_name input_isic_4digit input_reference_product_name input_tilt_sector input_tilt_subsector input_unit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7 tent soot_asianpiedstarling germany market for shed, large, wood, non-insulated, fire-unprotected distributor m2 RoW bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df 240. shed construction, large, wood, non-insulated, fire-unprotected '4100' shed, large, wood, non-insulated, fire-unprotected construction construction residential m2
#> 2 76269c17-78d6-420b-991a-aa38c51b45b7 table hire for parties frightening_chrysomelid spain market for shed, large, wood, non-insulated, fire-unprotected wholesaler m2 RoW bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df 240. shed construction, large, wood, non-insulated, fire-unprotected '4100' shed, large, wood, non-insulated, fire-unprotected construction construction residential m2
#> 3 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg <NA> <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg <NA> <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
#> 5 76269c17-78d6-420b-991a-aa38c51b45b7 tent flexible_dolphin austria market for shed, large, wood, non-insulated, fire-unprotected wholesaler m2 RoW bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df 240. shed construction, large, wood, non-insulated, fire-unprotected '4100' shed, large, wood, non-insulated, fire-unprotected construction construction residential m2
#> 6 76269c17-78d6-420b-991a-aa38c51b45b7 tent paramilitary_racerunner germany market for shed, large, wood, non-insulated, fire-unprotected wholesaler m2 RoW bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df 240. shed construction, large, wood, non-insulated, fire-unprotected '4100' shed, large, wood, non-insulated, fire-unprotected construction construction residential m2
#> 7 76269c17-78d6-420b-991a-aa38c51b45b7 open space amenities level_meadowhawk france market for shed, large, wood, non-insulated, fire-unprotected wholesaler m2 RoW bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df 240. shed construction, large, wood, non-insulated, fire-unprotected '4100' shed, large, wood, non-insulated, fire-unprotected construction construction residential m2
#> 8 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb tent heartrending_attwatersprairiechicken germany market for shed, large, wood, non-insulated, fire-unprotected distributor m2 RoW bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df 463. shed construction, large, wood, non-insulated, fire-unprotected '4100' shed, large, wood, non-insulated, fire-unprotected construction construction residential m2
#> 9 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb tent heartrending_attwatersprairiechicken germany market for shed, large, wood, non-insulated, fire-unprotected distributor m2 RoW bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df 451. shed construction, large, wood, non-insulated, fire-unprotected '4100' shed, large, wood, non-insulated, fire-unprotected construction construction residential m2
#> 10 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb tent heartrending_attwatersprairiechicken germany market for shed, large, wood, non-insulated, fire-unprotected distributor m2 RoW bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df 447. shed construction, large, wood, non-insulated, fire-unprotected '4100' shed, large, wood, non-insulated, fire-unprotected construction construction residential m2
#> # ℹ 87 more rows
#> # A tibble: 3 × 1
#> activity_uuid_product_uuid
#> <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb
#> 3 833caa78-30df-4374-900f-7f88ab44075b
# *uuid in companies that do NOT match *uuid in inputs
anti_join(companies, inputs) |>
print() |>
distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name)`
#> # A tibble: 4 × 7
#> activity_uuid_product_uuid clustered companies_id country ei_activity_name main_activity unit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg
#> 2 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering hyperbrutal_flea germany market for deep drawing, steel, 10000 kN press, automode distributor kg
#> 3 76269c17-78d6-420b-991a-aa38c51b45b7 deep-drawn metal part humanoid_elkhound germany market for deep drawing, steel, 10000 kN press, automode agent/ representative kg
#> 4 76269c17-78d6-420b-991a-aa38c51b45b7 drawn parts humanoid_elkhound germany market for deep drawing, steel, 10000 kN press, automode agent/ representative kg
#> # A tibble: 1 × 1
#> activity_uuid_product_uuid
#> <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7
# *uuid in inputs that do NOT match *uuid in companies
anti_join(inputs, companies) |>
print() |>
distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name)`
#> # A tibble: 85 × 11
#> activity_uuid_product_uuid ei_activity_name ei_geography input_activity_uuid_product_uuid input_co2_footprint input_ei_activity_name input_isic_4digit input_reference_product_name input_tilt_sector input_tilt_subsector input_unit
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb market for deep drawing, steel, 10000 kN press, automode RoW 55a5ac05-ab15-5a27-9d0e-6ecf840039f1_f10b8722-4be1-43d5-b17d-c51ad0e29d29 0.456 deep drawing, steel, 10000 kN press, automode '2591' deep drawing, steel, 10000 kN press, automode metals other metals kg
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production GLO bdc93cd8-00b4-5b3e-993e-b7fef7059e52_4e584f6f-2e71-4796-931e-bb9a273c161c 1.67 market for anode, for metal electrolysis '2790' anode, for metal electrolysis industry machinery & equipment kg
#> 3 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production RER 95fcd1bb-4dc6-516a-a3b2-30a4f0530639_3b1d249a-c924-4d6c-8e1f-647f562daa54 0.530 market for electric arc furnace dust '3821' electric arc furnace dust industry other industry kg
#> 4 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production RER daef2f9a-4108-52ae-90a7-fe64abad51bc_6e74937e-b691-4c49-9b8f-5ba44d7c081d 0.589 market for electric arc furnace slag '3821' electric arc furnace slag industry other industry kg
#> 5 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production RER 3b190359-a32e-5294-af63-983f38ce6525_759b89bd-3aa6-42ad-b767-5bb9ef5d331d 0.602 market group for electricity, medium voltage '3510' electricity, medium voltage power total power kWh
#> 6 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production GLO 2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f 7.32 market for ferrochromium, high-carbon, 68% Cr '2410' ferrochromium, high-carbon, 68% Cr metals iron & steel kg
#> 7 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production Europe, without Russia and Turkey 9392c694-12a6-5cd7-a421-d4866359df2c_0d3eda5a-4601-4573-9549-0701c459ab88 0.710 market for hard coal '0510' hard coal energy coal energy kg
#> 8 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production CH c18c6cc9-4a26-5c47-9ea9-8635ff2c158e_240c1a3c-1aba-4528-afc3-3f27f56583be 0.0106 market for inert waste, for final disposal '3821' inert waste, for final disposal industry other industry kg
#> 9 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production RER c4ec0b1e-2a3b-5700-871c-2adbbb29bc1d_4f312355-ac65-4635-8fb2-006dba64ce60 0.0581 market for iron scrap, sorted, pressed '3830' iron scrap, sorted, pressed industry other industry kg
#> 10 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production CH 7361f7fb-5cf2-598c-823a-a4b7e50c3d28_a9007f10-7e39-4d50-8f4a-d6d03ce3d673 1.22 market for natural gas, high pressure '3520' natural gas, high pressure energy gas energy m3
#> # ℹ 75 more rows
#> # A tibble: 2 × 1
#> activity_uuid_product_uuid
#> <chr>
#> 1 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb
#> 2 833caa78-30df-4374-900f-7f88ab44075b
In your comments I identify the following questions:
Yes. All *companies
datasets in tiltToyData have a clustered
column. You can see that in the website of tiltToyData under Examples. The new version of emissions_profile_any_companies
in PR #19 does not drop any column (if anything it will add some columns).
emission_profile_products[_ecoinvent]
... have activitiy_uuid_product_uuid
that are not existing in the emission_profile_any_company
dataset?Yes. In my reprex above search for "anti_join(products, companies)
". You'll see
# *uuid in products that do NOT match *uuid in companies
... more code
#> activity_uuid_product_uuid
#> <chr>
#> 1 833caa78-30df-4374-900f-7f88ab44075b
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb
Yes. In my reprex above search for "anti_join(companies, products)
". You'll see
# *uuid in companies that do NOT match *uuid in products
... more code
#> activity_uuid_product_uuid
#> <chr>
#> 1 833caa78-30df-4374-900f-7f88ab44075b
You may explore the code and data for yourself:
devtools::install_github("2DegreesInvesting/tiltToyData#19")
Is this what you expect?
Dear @maurolepore, Thanks a lot for this. Sooo there are two different possibilities of matching *uuids in products that do NOT match uuid in companies AND uuid in companies that do NOT match uiid in products. And I am afraid that I don't understand one case of it. This case here is correct:
uuid in products that do NOT match uuid in companies
anti_join(products, companies) |> print() |> distinct(activity_uuid_product_uuid)
However, this case here:
uuid in companies that do NOT match uuid in products
anti_join(companies, products) |> print() |> distinct(activity_uuid_product_uuid)
that shouldn't be the case. As remeber the product list shows ALL ecoinvent data which means there should be ALL uuids available if the company produces a clustered product that was able to be matched to a uuid.
Could we change this to make teh toy data more realistic?
Maybe this could also be a test in general that we can implement? If there are UUIDs in the company dataset which are not in the product dataset then there must be something wrong as per definition the product dataset (in a real world) captures all ecoinvent data. What do you think? Would that be a good test to also implement?
Also please note that with this ticket here might change the toy Data a bit as we then have in the company data set also company_ids without any uuid. But this is something that we can discuss later as well.
In any case thanks a lot for these reprexes as they really really help to become very precise in what we want!
Thanks @AnneSchoenauer for paying close attention.
I think we need to turn your expectation into an error message in tiltIndicator -- since it's a metholologial issue.
See if the reprex here captures the problem you see.
Regarding the toy datasets here, I'll remove the *uuid
in companies that don't match *uuid
in products
or inputs
.
@kalashsinghal any idea how those unexpected *uuid
got into the toy data? Maybe it's just an artifact of the artificial nature of these datasets, but any thoughts are welcome.
Regarding the toy datasets here, I'll remove the uuid in companies that don't match uuid in products or inputs. @kalashsinghal any idea how those unexpected *uuid got into the toy data? Maybe it's just an artifact of the artificial nature of these datasets, but any thoughts are welcome.
Dear @maurolepore @AnneSchoenauer I checked this issue with real datasets and there is no such uuid in companies that don't match uuid in products or inputs. This means we have atleast a single match for each uuid in companies dataset from uuid in products or inputs.
I am not 100% sure how the toy datasets are made, I would @maurolepore recommend you to please check how such cases appeared in toy data. Please let me know if you find anything odd! Thanks!
Relates to #19
Dear @maurolepore,
As we saw in the Tech Meeting today, I think it is needed that we all understand once the whole process with the profile_ranking and why I asked several times if we still have a dataset with the "clustered" (the europage products). I also would want to make one enhancement to the toy_datasets so that they become more realistic (even though they are still toy data sets). Maybe it is already implemented but I want to make sure with this ticket.
The enhancement would be that the toy dataset for the emission_profile_products (which are the ecoinvent products) have activitiy_uuid_product_uuid that are not existing in the emission_profile_any_company dataset. So that it looks like this.
Created on 2024-01-10 with reprex v2.0.2
Alrigth - so WHY is this important and again I think it is crucial to understand this. When it comes to profile ranking the process should be a such 1) Calculate the profile ranking on the WHOLE ecoinvent product data set. 2) Calculate the risk categories on this. 3) Matcihng it with the company data set. The company dataset will have some products that were not able to be matched with ecoinvent. And therefore, some ecoinvent products will be filtered out. Again UUID4, UUID 8 and UUID9. However, the profile ranking of all the other UUIDs will be the same as they were pre-calculated before matching. In a reprex this looks like this:
Created on 2024-01-10 with reprex v2.0.2
Please note @maurolepore that this reprex doesn't use the correct profile_ranking variable. I just wanted to make the point that the profile ranking is calculated first, then the risk categories and we then do the matching with europages companies and products (clustered).
Does this make it clear?