2DegreesInvesting / tiltToyData

Toy datasets for TILT
https://2degreesinvesting.github.io/tiltToyData/
GNU General Public License v3.0
0 stars 0 forks source link

New toy datasets for emissions profile #19

Closed maurolepore closed 5 months ago

maurolepore commented 6 months ago

Closes #7 Closes #20 Closes 2DegreesInvesting/tiltIndicator#566 Closes #22 Relates to 2DegreesInvesting/tiltToyDataPrivate#1

Features

reprex

Access deprecated data. It may still be necessary for a while. For example, tiltIndicatorAfter may need to update other toy datasets before they can match the new datasets.

devtools::load_all()
#> ℹ Loading tiltToyData
library(readr, warn.conflicts = FALSE)
options(readr.show_col_types = FALSE)

new <- toy_emissions_profile_any_companies()
new
#> [1] "/home/rstudio/git/tiltToyData/inst/extdata/emissions_profile_any_companies.csv.gz"
read_csv(new)
#> # A tibble: 76 × 7
#>    activity_uuid_product_uuid    clustered companies_id country ei_activity_name
#>    <chr>                         <chr>     <chr>        <chr>   <chr>           
#>  1 76269c17-78d6-420b-991a-aa38… tent      soot_asianp… germany market for shed…
#>  2 76269c17-78d6-420b-991a-aa38… table hi… frightening… spain   market for shed…
#>  3 76269c17-78d6-420b-991a-aa38… surface … hyperbrutal… germany market for deep…
#>  4 76269c17-78d6-420b-991a-aa38… surface … hyperbrutal… germany market for deep…
#>  5 76269c17-78d6-420b-991a-aa38… tent      flexible_do… austria market for shed…
#>  6 76269c17-78d6-420b-991a-aa38… tent      paramilitar… germany market for shed…
#>  7 76269c17-78d6-420b-991a-aa38… open spa… level_meado… france  market for shed…
#>  8 bf94b5a7-b7a2-46d1-bb95-84bc… tent      heartrendin… germany market for shed…
#>  9 76269c17-78d6-420b-991a-aa38… tent      traumatopho… germany market for shed…
#> 10 76269c17-78d6-420b-991a-aa38… tent      preliterary… germany market for shed…
#> # ℹ 66 more rows
#> # ℹ 2 more variables: main_activity <chr>, unit <chr>

old <- deprecated_path("emissions_profile_any_companies.csv.gz")
old
#> [1] "/home/rstudio/git/tiltToyData/inst/extdata/deprecated/emissions_profile_any_companies.csv.gz"
read_csv(old)
#> # A tibble: 9 × 4
#>   activity_uuid_product_uuid                        clustered companies_id unit 
#>   <chr>                                             <chr>     <chr>        <chr>
#> 1 0a242b09-772a-5edf-8e82-9cb4ba52a258_ae39ee61-d4… stove     fleischerei… unit 
#> 2 be06d25c-73dc-55fb-965b-0f300453e380_98b48ff2-22… oven      fleischerei… unit 
#> 3 977d997e-c257-5033-ba39-d0edeeef4ba2_0ace02fa-ec… steel     pecheries-b… kg   
#> 4 ebb8475e-ff57-5e4e-937b-b5788186a5ca_ccee034c-8b… aged che… hoche-butte… kg   
#> 5 ebb8475e-ff57-5e4e-937b-b5788186a5ca_ccee034c-8b… aged che… vicquelin-e… kg   
#> 6 ebb8475e-ff57-5e4e-937b-b5788186a5ca_ccee034c-8b… cheese    bst-procont… kg   
#> 7 2f7b77a7-1556-5c1b-b0aa-c4534ddc8885_38d493e9-6f… cream     leider-gmbh… kg   
#> 8 2f7b77a7-1556-5c1b-b0aa-c4534ddc8885_38d493e9-6f… rubber    cheries-baq… kg   
#> 9 <NA>                                              apple     ca-coity-tr… <NA>

New datasets

library(readr, warn.conflicts = FALSE)
library(tiltIndicator)
devtools::load_all()
#> ℹ Loading tiltToyData

options(readr.show_col_types = FALSE, width = 500)

companies <- read_csv(toy_emissions_profile_any_companies())
companies
#> # A tibble: 76 × 7
#>    activity_uuid_product_uuid           clustered                   companies_id                         country ei_activity_name                                              main_activity unit 
#>    <chr>                                <chr>                       <chr>                                <chr>   <chr>                                                         <chr>         <chr>
#>  1 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        soot_asianpiedstarling               germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2   
#>  2 76269c17-78d6-420b-991a-aa38c51b45b7 table hire for parties      frightening_chrysomelid              spain   market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#>  3 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea                     germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg   
#>  4 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering         hyperbrutal_flea                     germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg   
#>  5 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        flexible_dolphin                     austria market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#>  6 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        paramilitary_racerunner              germany market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#>  7 76269c17-78d6-420b-991a-aa38c51b45b7 open space amenities        level_meadowhawk                     france  market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#>  8 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb tent                        heartrending_attwatersprairiechicken germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2   
#>  9 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        traumatophobic_hanumanmonkey         germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2   
#> 10 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        preliterary_toucan                   germany market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#> # ℹ 66 more rows

products <- read_csv(toy_emissions_profile_products_ecoinvent())
products
#> # A tibble: 18 × 8
#>    activity_uuid_product_uuid           co2_footprint ei_activity_name                                              ei_geography isic_4digit tilt_sector  tilt_subsector           unit 
#>    <chr>                                        <dbl> <chr>                                                         <chr>        <chr>       <chr>        <chr>                    <chr>
#>  1 833caa78-30df-4374-900f-7f88ab44075b        14.1   iron-nickel-chromium alloy production                         RER          '2410'      Metals       Iron & Steel             kg   
#>  2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.419 market for deep drawing, steel, 10000 kN press, automode      GLO          '2591'      Metals       Other Metals             kg   
#>  3 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       481.    market for shed, large, wood, non-insulated, fire-unprotected GLO          '4100'      Construction Construction Residential m2   
#>  4 833caa78-30df-4374-900f-7f88ab44075b         9.47  iron-nickel-chromium alloy production                         RER          '2410'      Metals       Iron & Steel             kg   
#>  5 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.648 market for deep drawing, steel, 10000 kN press, automode      GLO          '2591'      Metals       Other Metals             kg   
#>  6 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       276.    market for shed, large, wood, non-insulated, fire-unprotected GLO          '4100'      Construction Construction Residential m2   
#>  7 833caa78-30df-4374-900f-7f88ab44075b        13.6   iron-nickel-chromium alloy production                         RER          '2410'      Metals       Iron & Steel             kg   
#>  8 76269c17-78d6-420b-991a-aa38c51b45b7         0.405 market for deep drawing, steel, 10000 kN press, automode      GLO          '2591'      Metals       Other Metals             kg   
#>  9 76269c17-78d6-420b-991a-aa38c51b45b7       447.    market for shed, large, wood, non-insulated, fire-unprotected GLO          '4100'      Construction Construction Residential m2   
#> 10 833caa78-30df-4374-900f-7f88ab44075b        14.7   iron-nickel-chromium alloy production                         RER          '2410'      Metals       Iron & Steel             kg   
#> 11 833caa78-30df-4374-900f-7f88ab44075b         0.390 market for deep drawing, steel, 10000 kN press, automode      GLO          '2591'      Metals       Other Metals             kg   
#> 12 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       442.    market for shed, large, wood, non-insulated, fire-unprotected GLO          '4100'      Construction Construction Residential m2   
#> 13 76269c17-78d6-420b-991a-aa38c51b45b7        14.1   iron-nickel-chromium alloy production                         RER          '2410'      Metals       Iron & Steel             kg   
#> 14 76269c17-78d6-420b-991a-aa38c51b45b7         0.884 market for deep drawing, steel, 10000 kN press, automode      GLO          '2591'      Metals       Other Metals             kg   
#> 15 76269c17-78d6-420b-991a-aa38c51b45b7       321.    market for shed, large, wood, non-insulated, fire-unprotected GLO          '4100'      Construction Construction Residential m2   
#> 16 833caa78-30df-4374-900f-7f88ab44075b        12.7   iron-nickel-chromium alloy production                         RER          '2410'      Metals       Iron & Steel             kg   
#> 17 76269c17-78d6-420b-991a-aa38c51b45b7         0.675 market for deep drawing, steel, 10000 kN press, automode      GLO          '2591'      Metals       Other Metals             kg   
#> 18 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       435.    market for shed, large, wood, non-insulated, fire-unprotected GLO          '4100'      Construction Construction Residential m2

upstream_products <- read_csv(toy_emissions_profile_upstream_products_ecoinvent())
upstream_products
#> # A tibble: 96 × 9
#>    activity_uuid_product_uuid           ei_geography input_activity_uuid_product_uuid                                          input_co2_footprint input_isic_4digit input_reference_product_name       input_tilt_sector input_tilt_subsector input_unit
#>    <chr>                                <chr>        <chr>                                                                                   <dbl> <chr>             <chr>                              <chr>             <chr>                <chr>     
#>  1 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb RER          bdc93cd8-00b4-5b3e-993e-b7fef7059e52_4e584f6f-2e71-4796-931e-bb9a273c161c             1.70e+0 '2790'            anode, for metal electrolysis      Metals            Iron & Steel         kg        
#>  2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb RER          fdb1f848-173f-5fe1-96a2-588171e87e30_c2c93af2-47cb-4ec7-a1bd-d3d572bca039             2.90e+8 '2815'            electric arc furnace converter     Metals            Iron & Steel         unit      
#>  3 76269c17-78d6-420b-991a-aa38c51b45b7 RER          95fcd1bb-4dc6-516a-a3b2-30a4f0530639_3b1d249a-c924-4d6c-8e1f-647f562daa54             4.25e-1 '3821'            electric arc furnace dust          Metals            Iron & Steel         kg        
#>  4 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb RER          daef2f9a-4108-52ae-90a7-fe64abad51bc_6e74937e-b691-4c49-9b8f-5ba44d7c081d             4.07e-1 '3821'            electric arc furnace slag          Metals            Iron & Steel         kg        
#>  5 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb RER          3b190359-a32e-5294-af63-983f38ce6525_759b89bd-3aa6-42ad-b767-5bb9ef5d331d             6.83e-1 '3510'            electricity, medium voltage        Metals            Iron & Steel         kWh       
#>  6 833caa78-30df-4374-900f-7f88ab44075b RER          2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f             1.20e+1 '2410'            ferrochromium, high-carbon, 68% Cr Metals            Iron & Steel         kg        
#>  7 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb RER          9392c694-12a6-5cd7-a421-d4866359df2c_0d3eda5a-4601-4573-9549-0701c459ab88             6.13e-1 '0510'            hard coal                          Metals            Iron & Steel         kg        
#>  8 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb RER          c18c6cc9-4a26-5c47-9ea9-8635ff2c158e_240c1a3c-1aba-4528-afc3-3f27f56583be             1.37e-2 '3821'            inert waste, for final disposal    Metals            Iron & Steel         kg        
#>  9 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb RER          c4ec0b1e-2a3b-5700-871c-2adbbb29bc1d_4f312355-ac65-4635-8fb2-006dba64ce60             7.21e-2 '3830'            iron scrap, sorted, pressed        Metals            Iron & Steel         kg        
#> 10 833caa78-30df-4374-900f-7f88ab44075b RER          7361f7fb-5cf2-598c-823a-a4b7e50c3d28_a9007f10-7e39-4d50-8f4a-d6d03ce3d673             7.09e-1 '3520'            natural gas, high pressure         Metals            Iron & Steel         m3        
#> # ℹ 86 more rows

result <- emissions_profile(companies, products)
result |> unnest_company()
#> # A tibble: 1,296 × 4
#>    companies_id           grouped_by  risk_category value
#>    <chr>                  <chr>       <chr>         <dbl>
#>  1 soot_asianpiedstarling all         high          0.333
#>  2 soot_asianpiedstarling all         medium        0.167
#>  3 soot_asianpiedstarling all         low           0.5  
#>  4 soot_asianpiedstarling isic_4digit high          0.5  
#>  5 soot_asianpiedstarling isic_4digit medium        0.167
#>  6 soot_asianpiedstarling isic_4digit low           0.333
#>  7 soot_asianpiedstarling tilt_sector high          0.333
#>  8 soot_asianpiedstarling tilt_sector medium        0.333
#>  9 soot_asianpiedstarling tilt_sector low           0.333
#> 10 soot_asianpiedstarling unit        high          0.333
#> # ℹ 1,286 more rows

result |> unnest_product()
#> # A tibble: 2,736 × 7
#>    companies_id           grouped_by  risk_category profile_ranking clustered activity_uuid_product_uuid           co2_footprint
#>    <chr>                  <chr>       <chr>                   <dbl> <chr>     <chr>                                        <dbl>
#>  1 soot_asianpiedstarling all         low                     0.111 tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.405
#>  2 soot_asianpiedstarling all         high                    0.944 tent      76269c17-78d6-420b-991a-aa38c51b45b7       447.   
#>  3 soot_asianpiedstarling all         medium                  0.556 tent      76269c17-78d6-420b-991a-aa38c51b45b7        14.1  
#>  4 soot_asianpiedstarling all         low                     0.333 tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.884
#>  5 soot_asianpiedstarling all         high                    0.778 tent      76269c17-78d6-420b-991a-aa38c51b45b7       321.   
#>  6 soot_asianpiedstarling all         low                     0.278 tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.675
#>  7 soot_asianpiedstarling isic_4digit low                     0.333 tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.405
#>  8 soot_asianpiedstarling isic_4digit high                    0.833 tent      76269c17-78d6-420b-991a-aa38c51b45b7       447.   
#>  9 soot_asianpiedstarling isic_4digit medium                  0.667 tent      76269c17-78d6-420b-991a-aa38c51b45b7        14.1  
#> 10 soot_asianpiedstarling isic_4digit high                    1     tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.884
#> # ℹ 2,726 more rows

result <- emissions_profile_upstream(companies, upstream_products)
result |> unnest_company()
#> # A tibble: 1,296 × 4
#>    companies_id           grouped_by        risk_category value
#>    <chr>                  <chr>             <chr>         <dbl>
#>  1 soot_asianpiedstarling all               high          0.333
#>  2 soot_asianpiedstarling all               medium        0.167
#>  3 soot_asianpiedstarling all               low           0.5  
#>  4 soot_asianpiedstarling input_isic_4digit high          0.333
#>  5 soot_asianpiedstarling input_isic_4digit medium        0.5  
#>  6 soot_asianpiedstarling input_isic_4digit low           0.167
#>  7 soot_asianpiedstarling input_tilt_sector high          0.333
#>  8 soot_asianpiedstarling input_tilt_sector medium        0.167
#>  9 soot_asianpiedstarling input_tilt_sector low           0.5  
#> 10 soot_asianpiedstarling input_unit        high          0.333
#> # ℹ 1,286 more rows

result |> unnest_product()
#> # A tibble: 4,140 × 8
#>    companies_id           grouped_by        risk_category profile_ranking clustered activity_uuid_product_uuid           input_activity_uuid_product_uuid                                          input_co2_footprint
#>    <chr>                  <chr>             <chr>                   <dbl> <chr>     <chr>                                <chr>                                                                                   <dbl>
#>  1 soot_asianpiedstarling all               low                     0.188 tent      76269c17-78d6-420b-991a-aa38c51b45b7 95fcd1bb-4dc6-516a-a3b2-30a4f0530639_3b1d249a-c924-4d6c-8e1f-647f562daa54              0.425 
#>  2 soot_asianpiedstarling all               low                     0.115 tent      76269c17-78d6-420b-991a-aa38c51b45b7 c4ec0b1e-2a3b-5700-871c-2adbbb29bc1d_4f312355-ac65-4635-8fb2-006dba64ce60              0.0781
#>  3 soot_asianpiedstarling all               high                    0.792 tent      76269c17-78d6-420b-991a-aa38c51b45b7 2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f             10.4   
#>  4 soot_asianpiedstarling all               high                    0.833 tent      76269c17-78d6-420b-991a-aa38c51b45b7 0d9d1001-6635-51b9-bc24-470161f83e97_23fccced-e1e5-421d-9abe-5b59c51a862e             26.3   
#>  5 soot_asianpiedstarling all               low                     0.312 tent      76269c17-78d6-420b-991a-aa38c51b45b7 55a5ac05-ab15-5a27-9d0e-6ecf840039f1_f10b8722-4be1-43d5-b17d-c51ad0e29d29              0.597 
#>  6 soot_asianpiedstarling all               medium                  0.542 tent      76269c17-78d6-420b-991a-aa38c51b45b7 a1383009-9188-5326-916a-1e4ea2d835c4_28c2473e-1e11-4078-9a76-de9550553adc              1.11  
#>  7 soot_asianpiedstarling input_isic_4digit medium                  0.611 tent      76269c17-78d6-420b-991a-aa38c51b45b7 95fcd1bb-4dc6-516a-a3b2-30a4f0530639_3b1d249a-c924-4d6c-8e1f-647f562daa54              0.425 
#>  8 soot_asianpiedstarling input_isic_4digit high                    0.833 tent      76269c17-78d6-420b-991a-aa38c51b45b7 c4ec0b1e-2a3b-5700-871c-2adbbb29bc1d_4f312355-ac65-4635-8fb2-006dba64ce60              0.0781
#>  9 soot_asianpiedstarling input_isic_4digit medium                  0.667 tent      76269c17-78d6-420b-991a-aa38c51b45b7 2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f             10.4   
#> 10 soot_asianpiedstarling input_isic_4digit low                     0.333 tent      76269c17-78d6-420b-991a-aa38c51b45b7 0d9d1001-6635-51b9-bc24-470161f83e97_23fccced-e1e5-421d-9abe-5b59c51a862e             26.3   
#> # ℹ 4,130 more rows

Created on 2024-01-05 with reprex v2.0.2


TODO

EXCEPTIONS

maurolepore commented 6 months ago

@kalashsinghal when you review this PR note the new datasets have columns that the old datasets don't have. Are they all necessary?

devtools::load_all()
#> ℹ Loading tiltToyData
library(readr, warn.conflicts = FALSE)
options(readr.show_col_types = FALSE)

# companies
new <- read_csv(toy_emissions_profile_any_companies())
old <- read_csv(deprecated_path("emissions_profile_any_companies.csv.gz"))
setdiff(names(old), names(new))
#> character(0)
setdiff(names(new), names(old))
#> [1] "country"          "ei_activity_name" "main_activity"

# products
old <- read_csv(toy_emissions_profile_products())
new <- read_csv(toy_emissions_profile_products_ecoinvent())
setdiff(names(old), names(new))
#> character(0)
setdiff(names(new), names(old))
#> [1] "ei_geography"

# upstrem_products
old <- read_csv(toy_emissions_profile_upstream_products())
new <- read_csv(toy_emissions_profile_upstream_products_ecoinvent())
setdiff(names(old), names(new))
#> character(0)
setdiff(names(new), names(old))
#> [1] "ei_geography"                 "input_reference_product_name"

Created on 2024-01-05 with reprex v2.0.2

maurolepore commented 6 months ago

Thanks @Kalash,

The raw data I used seems to lack the columns you want. And the original link you shared seems to no longer be valid. But the updated file here has the columns. So good to go :-)

https://drive.google.com/drive/folders/1AbSGCGFVcRM3zLfPg5FdwScTRRaCbIws

Maybe we can post this link to the files in the README file of tiltIndicatorBefore?

maurolepore commented 6 months ago

I updated the datasets with those from https://drive.google.com/drive/folders/1AbSGCGFVcRM3zLfPg5FdwScTRRaCbIws. Now emissions_profile_upstream_products_ecoinvent.csv has the *activity_name columns.

Comparing columns between old and new datasets

devtools::load_all()
#> ℹ Loading tiltToyData
library(readr, warn.conflicts = FALSE)
options(readr.show_col_types = FALSE)

# companies
new <- read_csv(toy_emissions_profile_any_companies())
old <- read_csv(deprecated_path("emissions_profile_any_companies.csv.gz"))
setdiff(names(old), names(new))
#> character(0)
setdiff(names(new), names(old))
#> [1] "country"          "ei_activity_name" "main_activity"

# products
old <- read_csv(toy_emissions_profile_products())
new <- read_csv(toy_emissions_profile_products_ecoinvent())
setdiff(names(old), names(new))
#> character(0)
setdiff(names(new), names(old))
#> [1] "ei_geography"

# upstrem_products
old <- read_csv(toy_emissions_profile_upstream_products())
new <- read_csv(toy_emissions_profile_upstream_products_ecoinvent())
setdiff(names(old), names(new))
#> character(0)
setdiff(names(new), names(old))
#> [1] "ei_activity_name"             "ei_geography"                
#> [3] "input_ei_activity_name"       "input_reference_product_name"

Usage in tiltIndicator

library(readr, warn.conflicts = FALSE)
  library(tiltIndicator)
  devtools::load_all()
#> ℹ Loading tiltToyData

  options(readr.show_col_types = FALSE, width = 500)

  companies <- read_csv(toy_emissions_profile_any_companies())
  companies
#> # A tibble: 76 × 7
#>    activity_uuid_product_uuid           clustered                   companies_id                         country ei_activity_name                                              main_activity unit 
#>    <chr>                                <chr>                       <chr>                                <chr>   <chr>                                                         <chr>         <chr>
#>  1 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        soot_asianpiedstarling               germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2   
#>  2 76269c17-78d6-420b-991a-aa38c51b45b7 table hire for parties      frightening_chrysomelid              spain   market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#>  3 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea                     germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg   
#>  4 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering         hyperbrutal_flea                     germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg   
#>  5 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        flexible_dolphin                     austria market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#>  6 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        paramilitary_racerunner              germany market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#>  7 76269c17-78d6-420b-991a-aa38c51b45b7 open space amenities        level_meadowhawk                     france  market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#>  8 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb tent                        heartrending_attwatersprairiechicken germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2   
#>  9 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        traumatophobic_hanumanmonkey         germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2   
#> 10 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        preliterary_toucan                   germany market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#> # ℹ 66 more rows

  products <- read_csv(toy_emissions_profile_products_ecoinvent())
  products
#> # A tibble: 18 × 8
#>    activity_uuid_product_uuid           co2_footprint ei_activity_name                                              ei_geography isic_4digit tilt_sector  tilt_subsector           unit 
#>    <chr>                                        <dbl> <chr>                                                         <chr>        <chr>       <chr>        <chr>                    <chr>
#>  1 833caa78-30df-4374-900f-7f88ab44075b        14.1   iron-nickel-chromium alloy production                         RER          ''2410''    metals       iron & steel             kg   
#>  2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.419 market for deep drawing, steel, 10000 kN press, automode      GLO          ''2591''    metals       other metals             kg   
#>  3 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       481.    market for shed, large, wood, non-insulated, fire-unprotected GLO          ''4100''    construction construction residential m2   
#>  4 833caa78-30df-4374-900f-7f88ab44075b         9.47  iron-nickel-chromium alloy production                         RER          ''2410''    metals       iron & steel             kg   
#>  5 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.648 market for deep drawing, steel, 10000 kN press, automode      GLO          ''2591''    metals       other metals             kg   
#>  6 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       276.    market for shed, large, wood, non-insulated, fire-unprotected GLO          ''4100''    construction construction residential m2   
#>  7 833caa78-30df-4374-900f-7f88ab44075b        13.6   iron-nickel-chromium alloy production                         RER          ''2410''    metals       iron & steel             kg   
#>  8 76269c17-78d6-420b-991a-aa38c51b45b7         0.405 market for deep drawing, steel, 10000 kN press, automode      GLO          ''2591''    metals       other metals             kg   
#>  9 76269c17-78d6-420b-991a-aa38c51b45b7       447.    market for shed, large, wood, non-insulated, fire-unprotected GLO          ''4100''    construction construction residential m2   
#> 10 833caa78-30df-4374-900f-7f88ab44075b        14.7   iron-nickel-chromium alloy production                         RER          ''2410''    metals       iron & steel             kg   
#> 11 833caa78-30df-4374-900f-7f88ab44075b         0.390 market for deep drawing, steel, 10000 kN press, automode      GLO          ''2591''    metals       other metals             kg   
#> 12 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       442.    market for shed, large, wood, non-insulated, fire-unprotected GLO          ''4100''    construction construction residential m2   
#> 13 76269c17-78d6-420b-991a-aa38c51b45b7        14.1   iron-nickel-chromium alloy production                         RER          ''2410''    metals       iron & steel             kg   
#> 14 76269c17-78d6-420b-991a-aa38c51b45b7         0.884 market for deep drawing, steel, 10000 kN press, automode      GLO          ''2591''    metals       other metals             kg   
#> 15 76269c17-78d6-420b-991a-aa38c51b45b7       321.    market for shed, large, wood, non-insulated, fire-unprotected GLO          ''4100''    construction construction residential m2   
#> 16 833caa78-30df-4374-900f-7f88ab44075b        12.7   iron-nickel-chromium alloy production                         RER          ''2410''    metals       iron & steel             kg   
#> 17 76269c17-78d6-420b-991a-aa38c51b45b7         0.675 market for deep drawing, steel, 10000 kN press, automode      GLO          ''2591''    metals       other metals             kg   
#> 18 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       435.    market for shed, large, wood, non-insulated, fire-unprotected GLO          ''4100''    construction construction residential m2

  upstream_products <- read_csv(toy_emissions_profile_upstream_products_ecoinvent())
  upstream_products
#> # A tibble: 96 × 11
#>    activity_uuid_product_uuid           ei_activity_name                                              ei_geography                      input_activity_uuid_product_uuid                                          input_co2_footprint input_ei_activity_name                                          input_isic_4digit input_reference_product_name                       input_tilt_sector input_tilt_subsector     input_unit
#>    <chr>                                <chr>                                                         <chr>                             <chr>                                                                                   <dbl> <chr>                                                           <chr>             <chr>                                              <chr>             <chr>                    <chr>     
#>  1 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb market for deep drawing, steel, 10000 kN press, automode      RoW                               55a5ac05-ab15-5a27-9d0e-6ecf840039f1_f10b8722-4be1-43d5-b17d-c51ad0e29d29             4.56e-1 deep drawing, steel, 10000 kN press, automode                   ''2591''          deep drawing, steel, 10000 kN press, automode      metals            other metals             kg        
#>  2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb market for shed, large, wood, non-insulated, fire-unprotected RoW                               bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df             4.63e+2 shed construction, large, wood, non-insulated, fire-unprotected ''4100''          shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#>  3 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         GLO                               bdc93cd8-00b4-5b3e-993e-b7fef7059e52_4e584f6f-2e71-4796-931e-bb9a273c161c             1.67e+0 market for anode, for metal electrolysis                        ''2790''          anode, for metal electrolysis                      industry          machinery & equipment    kg        
#>  4 76269c17-78d6-420b-991a-aa38c51b45b7 iron-nickel-chromium alloy production                         RER                               fdb1f848-173f-5fe1-96a2-588171e87e30_c2c93af2-47cb-4ec7-a1bd-d3d572bca039             1.45e+8 electric arc furnace converter construction                     ''2815''          electric arc furnace converter                     industry          other industry           unit      
#>  5 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         RER                               95fcd1bb-4dc6-516a-a3b2-30a4f0530639_3b1d249a-c924-4d6c-8e1f-647f562daa54             5.30e-1 market for electric arc furnace dust                            ''3821''          electric arc furnace dust                          industry          other industry           kg        
#>  6 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         RER                               daef2f9a-4108-52ae-90a7-fe64abad51bc_6e74937e-b691-4c49-9b8f-5ba44d7c081d             5.89e-1 market for electric arc furnace slag                            ''3821''          electric arc furnace slag                          industry          other industry           kg        
#>  7 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         RER                               3b190359-a32e-5294-af63-983f38ce6525_759b89bd-3aa6-42ad-b767-5bb9ef5d331d             6.02e-1 market group for electricity, medium voltage                    ''3510''          electricity, medium voltage                        power             total power              kWh       
#>  8 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         GLO                               2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f             7.32e+0 market for ferrochromium, high-carbon, 68% Cr                   ''2410''          ferrochromium, high-carbon, 68% Cr                 metals            iron & steel             kg        
#>  9 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         Europe, without Russia and Turkey 9392c694-12a6-5cd7-a421-d4866359df2c_0d3eda5a-4601-4573-9549-0701c459ab88             7.10e-1 market for hard coal                                            ''0510''          hard coal                                          energy            coal energy              kg        
#> 10 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         CH                                c18c6cc9-4a26-5c47-9ea9-8635ff2c158e_240c1a3c-1aba-4528-afc3-3f27f56583be             1.06e-2 market for inert waste, for final disposal                      ''3821''          inert waste, for final disposal                    industry          other industry           kg        
#> # ℹ 86 more rows

  result <- emissions_profile(companies, products)
  result |> unnest_company()
#> # A tibble: 1,296 × 4
#>    companies_id           grouped_by  risk_category value
#>    <chr>                  <chr>       <chr>         <dbl>
#>  1 soot_asianpiedstarling all         high          0.333
#>  2 soot_asianpiedstarling all         medium        0.167
#>  3 soot_asianpiedstarling all         low           0.5  
#>  4 soot_asianpiedstarling isic_4digit high          0.5  
#>  5 soot_asianpiedstarling isic_4digit medium        0.167
#>  6 soot_asianpiedstarling isic_4digit low           0.333
#>  7 soot_asianpiedstarling tilt_sector high          0.333
#>  8 soot_asianpiedstarling tilt_sector medium        0.333
#>  9 soot_asianpiedstarling tilt_sector low           0.333
#> 10 soot_asianpiedstarling unit        high          0.333
#> # ℹ 1,286 more rows

  result |> unnest_product()
#> # A tibble: 2,736 × 7
#>    companies_id           grouped_by  risk_category profile_ranking clustered activity_uuid_product_uuid           co2_footprint
#>    <chr>                  <chr>       <chr>                   <dbl> <chr>     <chr>                                        <dbl>
#>  1 soot_asianpiedstarling all         low                     0.111 tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.405
#>  2 soot_asianpiedstarling all         high                    0.944 tent      76269c17-78d6-420b-991a-aa38c51b45b7       447.   
#>  3 soot_asianpiedstarling all         medium                  0.556 tent      76269c17-78d6-420b-991a-aa38c51b45b7        14.1  
#>  4 soot_asianpiedstarling all         low                     0.333 tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.884
#>  5 soot_asianpiedstarling all         high                    0.778 tent      76269c17-78d6-420b-991a-aa38c51b45b7       321.   
#>  6 soot_asianpiedstarling all         low                     0.278 tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.675
#>  7 soot_asianpiedstarling isic_4digit low                     0.333 tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.405
#>  8 soot_asianpiedstarling isic_4digit high                    0.833 tent      76269c17-78d6-420b-991a-aa38c51b45b7       447.   
#>  9 soot_asianpiedstarling isic_4digit medium                  0.667 tent      76269c17-78d6-420b-991a-aa38c51b45b7        14.1  
#> 10 soot_asianpiedstarling isic_4digit high                    1     tent      76269c17-78d6-420b-991a-aa38c51b45b7         0.884
#> # ℹ 2,726 more rows

  result <- emissions_profile_upstream(companies, upstream_products)
  result |> unnest_company()
#> # A tibble: 1,296 × 4
#>    companies_id           grouped_by        risk_category value
#>    <chr>                  <chr>             <chr>         <dbl>
#>  1 soot_asianpiedstarling all               high          0.5  
#>  2 soot_asianpiedstarling all               medium        0.333
#>  3 soot_asianpiedstarling all               low           0.167
#>  4 soot_asianpiedstarling input_isic_4digit high          0    
#>  5 soot_asianpiedstarling input_isic_4digit medium        0.167
#>  6 soot_asianpiedstarling input_isic_4digit low           0.833
#>  7 soot_asianpiedstarling input_tilt_sector high          0.167
#>  8 soot_asianpiedstarling input_tilt_sector medium        0.667
#>  9 soot_asianpiedstarling input_tilt_sector low           0.167
#> 10 soot_asianpiedstarling input_unit        high          0.167
#> # ℹ 1,286 more rows

  result |> unnest_product()
#> # A tibble: 4,140 × 8
#>    companies_id           grouped_by        risk_category profile_ranking clustered activity_uuid_product_uuid           input_activity_uuid_product_uuid                                          input_co2_footprint
#>    <chr>                  <chr>             <chr>                   <dbl> <chr>     <chr>                                <chr>                                                                                   <dbl>
#>  1 soot_asianpiedstarling all               high                    0.958 tent      76269c17-78d6-420b-991a-aa38c51b45b7 fdb1f848-173f-5fe1-96a2-588171e87e30_c2c93af2-47cb-4ec7-a1bd-d3d572bca039       144872157.   
#>  2 soot_asianpiedstarling all               high                    0.740 tent      76269c17-78d6-420b-991a-aa38c51b45b7 2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f               6.08 
#>  3 soot_asianpiedstarling all               low                     0.219 tent      76269c17-78d6-420b-991a-aa38c51b45b7 daef2f9a-4108-52ae-90a7-fe64abad51bc_6e74937e-b691-4c49-9b8f-5ba44d7c081d               0.461
#>  4 soot_asianpiedstarling all               medium                  0.458 tent      76269c17-78d6-420b-991a-aa38c51b45b7 7361f7fb-5cf2-598c-823a-a4b7e50c3d28_a9007f10-7e39-4d50-8f4a-d6d03ce3d673               0.808
#>  5 soot_asianpiedstarling all               high                    0.885 tent      76269c17-78d6-420b-991a-aa38c51b45b7 bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df             240.   
#>  6 soot_asianpiedstarling all               medium                  0.469 tent      76269c17-78d6-420b-991a-aa38c51b45b7 7361f7fb-5cf2-598c-823a-a4b7e50c3d28_a9007f10-7e39-4d50-8f4a-d6d03ce3d673               0.834
#>  7 soot_asianpiedstarling input_isic_4digit low                     0.333 tent      76269c17-78d6-420b-991a-aa38c51b45b7 fdb1f848-173f-5fe1-96a2-588171e87e30_c2c93af2-47cb-4ec7-a1bd-d3d572bca039       144872157.   
#>  8 soot_asianpiedstarling input_isic_4digit low                     0.333 tent      76269c17-78d6-420b-991a-aa38c51b45b7 2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f               6.08 
#>  9 soot_asianpiedstarling input_isic_4digit medium                  0.667 tent      76269c17-78d6-420b-991a-aa38c51b45b7 daef2f9a-4108-52ae-90a7-fe64abad51bc_6e74937e-b691-4c49-9b8f-5ba44d7c081d               0.461
#> 10 soot_asianpiedstarling input_isic_4digit low                     0.167 tent      76269c17-78d6-420b-991a-aa38c51b45b7 7361f7fb-5cf2-598c-823a-a4b7e50c3d28_a9007f10-7e39-4d50-8f4a-d6d03ce3d673               0.808
#> # ℹ 4,130 more rows

Created on 2024-01-08 with reprex v2.0.2

maurolepore commented 6 months ago

@kalashsinghal

Thanks!

RE:

Please add these two columns as well with any random values because they will not be used in tiltIndicator.

I didn't randomize the values as I believe those columns they are not private data in themselves. Do you think they should indeed be random? Do we need to ask Anne?

What IS random and fake in these datasets is this:

maurolepore commented 6 months ago

FIXME

The *isic* column has the wrong quoting.

library(readr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
devtools::load_all()
#> ℹ Loading tiltToyData

options(readr.show_col_types = FALSE, width = 500)

products <- read_csv(toy_emissions_profile_products_ecoinvent())
products |> 
  relocate(matches("isic"))
#> # A tibble: 18 × 8
#>    isic_4digit activity_uuid_product_uuid           co2_footprint ei_activity_name                                              ei_geography tilt_sector  tilt_subsector           unit 
#>    <chr>       <chr>                                        <dbl> <chr>                                                         <chr>        <chr>        <chr>                    <chr>
#>  1 ''2410''    833caa78-30df-4374-900f-7f88ab44075b        14.1   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#>  2 ''2591''    bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.419 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#>  3 ''4100''    bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       481.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#>  4 ''2410''    833caa78-30df-4374-900f-7f88ab44075b         9.47  iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#>  5 ''2591''    bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.648 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#>  6 ''4100''    bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       276.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#>  7 ''2410''    833caa78-30df-4374-900f-7f88ab44075b        13.6   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#>  8 ''2591''    76269c17-78d6-420b-991a-aa38c51b45b7         0.405 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#>  9 ''4100''    76269c17-78d6-420b-991a-aa38c51b45b7       447.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#> 10 ''2410''    833caa78-30df-4374-900f-7f88ab44075b        14.7   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#> 11 ''2591''    833caa78-30df-4374-900f-7f88ab44075b         0.390 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#> 12 ''4100''    bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       442.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#> 13 ''2410''    76269c17-78d6-420b-991a-aa38c51b45b7        14.1   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#> 14 ''2591''    76269c17-78d6-420b-991a-aa38c51b45b7         0.884 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#> 15 ''4100''    76269c17-78d6-420b-991a-aa38c51b45b7       321.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#> 16 ''2410''    833caa78-30df-4374-900f-7f88ab44075b        12.7   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#> 17 ''2591''    76269c17-78d6-420b-991a-aa38c51b45b7         0.675 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#> 18 ''4100''    bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       435.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2

upstream_products <- read_csv(toy_emissions_profile_upstream_products_ecoinvent())
upstream_products |> 
  relocate(matches("isic"))
#> # A tibble: 96 × 11
#>    input_isic_4digit activity_uuid_product_uuid           ei_activity_name                                              ei_geography                      input_activity_uuid_product_uuid                                          input_co2_footprint input_ei_activity_name                                          input_reference_product_name                       input_tilt_sector input_tilt_subsector     input_unit
#>    <chr>             <chr>                                <chr>                                                         <chr>                             <chr>                                                                                   <dbl> <chr>                                                           <chr>                                              <chr>             <chr>                    <chr>     
#>  1 ''2591''          bf94b5a7-b7a2-46d1-bb95-84bc560b12fb market for deep drawing, steel, 10000 kN press, automode      RoW                               55a5ac05-ab15-5a27-9d0e-6ecf840039f1_f10b8722-4be1-43d5-b17d-c51ad0e29d29             4.56e-1 deep drawing, steel, 10000 kN press, automode                   deep drawing, steel, 10000 kN press, automode      metals            other metals             kg        
#>  2 ''4100''          bf94b5a7-b7a2-46d1-bb95-84bc560b12fb market for shed, large, wood, non-insulated, fire-unprotected RoW                               bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df             4.63e+2 shed construction, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#>  3 ''2790''          bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         GLO                               bdc93cd8-00b4-5b3e-993e-b7fef7059e52_4e584f6f-2e71-4796-931e-bb9a273c161c             1.67e+0 market for anode, for metal electrolysis                        anode, for metal electrolysis                      industry          machinery & equipment    kg        
#>  4 ''2815''          76269c17-78d6-420b-991a-aa38c51b45b7 iron-nickel-chromium alloy production                         RER                               fdb1f848-173f-5fe1-96a2-588171e87e30_c2c93af2-47cb-4ec7-a1bd-d3d572bca039             1.45e+8 electric arc furnace converter construction                     electric arc furnace converter                     industry          other industry           unit      
#>  5 ''3821''          bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         RER                               95fcd1bb-4dc6-516a-a3b2-30a4f0530639_3b1d249a-c924-4d6c-8e1f-647f562daa54             5.30e-1 market for electric arc furnace dust                            electric arc furnace dust                          industry          other industry           kg        
#>  6 ''3821''          bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         RER                               daef2f9a-4108-52ae-90a7-fe64abad51bc_6e74937e-b691-4c49-9b8f-5ba44d7c081d             5.89e-1 market for electric arc furnace slag                            electric arc furnace slag                          industry          other industry           kg        
#>  7 ''3510''          bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         RER                               3b190359-a32e-5294-af63-983f38ce6525_759b89bd-3aa6-42ad-b767-5bb9ef5d331d             6.02e-1 market group for electricity, medium voltage                    electricity, medium voltage                        power             total power              kWh       
#>  8 ''2410''          bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         GLO                               2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f             7.32e+0 market for ferrochromium, high-carbon, 68% Cr                   ferrochromium, high-carbon, 68% Cr                 metals            iron & steel             kg        
#>  9 ''0510''          bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         Europe, without Russia and Turkey 9392c694-12a6-5cd7-a421-d4866359df2c_0d3eda5a-4601-4573-9549-0701c459ab88             7.10e-1 market for hard coal                                            hard coal                                          energy            coal energy              kg        
#> 10 ''3821''          bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         CH                                c18c6cc9-4a26-5c47-9ea9-8635ff2c158e_240c1a3c-1aba-4528-afc3-3f27f56583be             1.06e-2 market for inert waste, for final disposal                      inert waste, for final disposal                    industry          other industry           kg        
#> # ℹ 86 more rows

Created on 2024-01-09 with reprex v2.0.2

maurolepore commented 6 months ago

FIXED

library(readr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
devtools::load_all()
#> ℹ Loading tiltToyData

options(readr.show_col_types = FALSE, width = 1000)

products <- read_csv(toy_emissions_profile_products_ecoinvent())
products |> relocate(matches("isic"))
#> # A tibble: 18 × 8
#>    isic_4digit activity_uuid_product_uuid           co2_footprint ei_activity_name                                              ei_geography tilt_sector  tilt_subsector           unit 
#>    <chr>       <chr>                                        <dbl> <chr>                                                         <chr>        <chr>        <chr>                    <chr>
#>  1 '2410'      833caa78-30df-4374-900f-7f88ab44075b        14.1   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#>  2 '2591'      bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.419 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#>  3 '4100'      bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       481.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#>  4 '2410'      833caa78-30df-4374-900f-7f88ab44075b         9.47  iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#>  5 '2591'      bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.648 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#>  6 '4100'      bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       276.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#>  7 '2410'      833caa78-30df-4374-900f-7f88ab44075b        13.6   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#>  8 '2591'      76269c17-78d6-420b-991a-aa38c51b45b7         0.405 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#>  9 '4100'      76269c17-78d6-420b-991a-aa38c51b45b7       447.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#> 10 '2410'      833caa78-30df-4374-900f-7f88ab44075b        14.7   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#> 11 '2591'      833caa78-30df-4374-900f-7f88ab44075b         0.390 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#> 12 '4100'      bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       442.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#> 13 '2410'      76269c17-78d6-420b-991a-aa38c51b45b7        14.1   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#> 14 '2591'      76269c17-78d6-420b-991a-aa38c51b45b7         0.884 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#> 15 '4100'      76269c17-78d6-420b-991a-aa38c51b45b7       321.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2   
#> 16 '2410'      833caa78-30df-4374-900f-7f88ab44075b        12.7   iron-nickel-chromium alloy production                         RER          metals       iron & steel             kg   
#> 17 '2591'      76269c17-78d6-420b-991a-aa38c51b45b7         0.675 market for deep drawing, steel, 10000 kN press, automode      GLO          metals       other metals             kg   
#> 18 '4100'      bf94b5a7-b7a2-46d1-bb95-84bc560b12fb       435.    market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2

inputs <- read_csv(toy_emissions_profile_upstream_products_ecoinvent())
inputs |> relocate(matches("isic"))
#> # A tibble: 96 × 11
#>    input_isic_4digit activity_uuid_product_uuid           ei_activity_name                                              ei_geography                      input_activity_uuid_product_uuid                                          input_co2_footprint input_ei_activity_name                                          input_reference_product_name                       input_tilt_sector input_tilt_subsector     input_unit
#>    <chr>             <chr>                                <chr>                                                         <chr>                             <chr>                                                                                   <dbl> <chr>                                                           <chr>                                              <chr>             <chr>                    <chr>     
#>  1 '2591'            bf94b5a7-b7a2-46d1-bb95-84bc560b12fb market for deep drawing, steel, 10000 kN press, automode      RoW                               55a5ac05-ab15-5a27-9d0e-6ecf840039f1_f10b8722-4be1-43d5-b17d-c51ad0e29d29             4.56e-1 deep drawing, steel, 10000 kN press, automode                   deep drawing, steel, 10000 kN press, automode      metals            other metals             kg        
#>  2 '4100'            bf94b5a7-b7a2-46d1-bb95-84bc560b12fb market for shed, large, wood, non-insulated, fire-unprotected RoW                               bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df             4.63e+2 shed construction, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#>  3 '2790'            bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         GLO                               bdc93cd8-00b4-5b3e-993e-b7fef7059e52_4e584f6f-2e71-4796-931e-bb9a273c161c             1.67e+0 market for anode, for metal electrolysis                        anode, for metal electrolysis                      industry          machinery & equipment    kg        
#>  4 '2815'            76269c17-78d6-420b-991a-aa38c51b45b7 iron-nickel-chromium alloy production                         RER                               fdb1f848-173f-5fe1-96a2-588171e87e30_c2c93af2-47cb-4ec7-a1bd-d3d572bca039             1.45e+8 electric arc furnace converter construction                     electric arc furnace converter                     industry          other industry           unit      
#>  5 '3821'            bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         RER                               95fcd1bb-4dc6-516a-a3b2-30a4f0530639_3b1d249a-c924-4d6c-8e1f-647f562daa54             5.30e-1 market for electric arc furnace dust                            electric arc furnace dust                          industry          other industry           kg        
#>  6 '3821'            bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         RER                               daef2f9a-4108-52ae-90a7-fe64abad51bc_6e74937e-b691-4c49-9b8f-5ba44d7c081d             5.89e-1 market for electric arc furnace slag                            electric arc furnace slag                          industry          other industry           kg        
#>  7 '3510'            bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         RER                               3b190359-a32e-5294-af63-983f38ce6525_759b89bd-3aa6-42ad-b767-5bb9ef5d331d             6.02e-1 market group for electricity, medium voltage                    electricity, medium voltage                        power             total power              kWh       
#>  8 '2410'            bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         GLO                               2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f             7.32e+0 market for ferrochromium, high-carbon, 68% Cr                   ferrochromium, high-carbon, 68% Cr                 metals            iron & steel             kg        
#>  9 '0510'            bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         Europe, without Russia and Turkey 9392c694-12a6-5cd7-a421-d4866359df2c_0d3eda5a-4601-4573-9549-0701c459ab88             7.10e-1 market for hard coal                                            hard coal                                          energy            coal energy              kg        
#> 10 '3821'            bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                         CH                                c18c6cc9-4a26-5c47-9ea9-8635ff2c158e_240c1a3c-1aba-4528-afc3-3f27f56583be             1.06e-2 market for inert waste, for final disposal                      inert waste, for final disposal                    industry          other industry           kg        
#> # ℹ 86 more rows

Created on 2024-01-09 with reprex v2.0.2

maurolepore commented 6 months ago

@AnneSchoenauer, today you asked if the new *products_ecoinvent datasets have *uuid that don't match companies and the other way around.

This reprex shows that the answer is yes. For completion I also show *uuid that do match.

@kalashsinghal please see if this is what you exect.

library(readr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
devtools::load_all()
#> ℹ Loading tiltToyData

options(readr.show_col_types = FALSE, width = 1000)

companies <- read_csv(toy_emissions_profile_any_companies())
products <- read_csv(toy_emissions_profile_products_ecoinvent())

# *uuid in companies that match *uuid in products
left_join(companies, products, relationship = "many-to-many") |> 
  print() |> 
  distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name, unit)`
#> # A tibble: 155 × 12
#>    activity_uuid_product_uuid           clustered                   companies_id            country ei_activity_name                                              main_activity unit  co2_footprint ei_geography isic_4digit tilt_sector  tilt_subsector          
#>    <chr>                                <chr>                       <chr>                   <chr>   <chr>                                                         <chr>         <chr>         <dbl> <chr>        <chr>       <chr>        <chr>                   
#>  1 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        soot_asianpiedstarling  germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2          447.    GLO          '4100'      construction construction residential
#>  2 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        soot_asianpiedstarling  germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2          321.    GLO          '4100'      construction construction residential
#>  3 76269c17-78d6-420b-991a-aa38c51b45b7 table hire for parties      frightening_chrysomelid spain   market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2          447.    GLO          '4100'      construction construction residential
#>  4 76269c17-78d6-420b-991a-aa38c51b45b7 table hire for parties      frightening_chrysomelid spain   market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2          321.    GLO          '4100'      construction construction residential
#>  5 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea        germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg            0.405 GLO          '2591'      metals       other metals            
#>  6 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea        germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg            0.884 GLO          '2591'      metals       other metals            
#>  7 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea        germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg            0.675 GLO          '2591'      metals       other metals            
#>  8 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering         hyperbrutal_flea        germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg            0.405 GLO          '2591'      metals       other metals            
#>  9 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering         hyperbrutal_flea        germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg            0.884 GLO          '2591'      metals       other metals            
#> 10 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering         hyperbrutal_flea        germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg            0.675 GLO          '2591'      metals       other metals            
#> # ℹ 145 more rows
#> # A tibble: 3 × 1
#>   activity_uuid_product_uuid          
#>   <chr>                               
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb
#> 3 833caa78-30df-4374-900f-7f88ab44075b

# *uuid in companies that do NOT match *uuid in products
anti_join(companies, products) |> 
  print() |> 
  distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name, unit)`
#> # A tibble: 4 × 7
#>   activity_uuid_product_uuid           clustered       companies_id                        country     ei_activity_name                                              main_activity unit 
#>   <chr>                                <chr>           <chr>                               <chr>       <chr>                                                         <chr>         <chr>
#> 1 833caa78-30df-4374-900f-7f88ab44075b garden fittings weak_meadowlark                     netherlands market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#> 2 833caa78-30df-4374-900f-7f88ab44075b garden fittings arrogant_ewe                        netherlands market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2   
#> 3 833caa78-30df-4374-900f-7f88ab44075b tent            pseudoeconomical_easternglasslizard germany     market for shed, large, wood, non-insulated, fire-unprotected distributor   m2   
#> 4 833caa78-30df-4374-900f-7f88ab44075b tent            charterable_wren                    germany     market for shed, large, wood, non-insulated, fire-unprotected distributor   m2
#> # A tibble: 1 × 1
#>   activity_uuid_product_uuid          
#>   <chr>                               
#> 1 833caa78-30df-4374-900f-7f88ab44075b

# *uuid in products that do NOT match *uuid in companies
anti_join(products, companies) |> 
  print() |> 
  distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name, unit)`
#> # A tibble: 8 × 8
#>   activity_uuid_product_uuid           co2_footprint ei_activity_name                                         ei_geography isic_4digit tilt_sector tilt_subsector unit 
#>   <chr>                                        <dbl> <chr>                                                    <chr>        <chr>       <chr>       <chr>          <chr>
#> 1 833caa78-30df-4374-900f-7f88ab44075b        14.1   iron-nickel-chromium alloy production                    RER          '2410'      metals      iron & steel   kg   
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.419 market for deep drawing, steel, 10000 kN press, automode GLO          '2591'      metals      other metals   kg   
#> 3 833caa78-30df-4374-900f-7f88ab44075b         9.47  iron-nickel-chromium alloy production                    RER          '2410'      metals      iron & steel   kg   
#> 4 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb         0.648 market for deep drawing, steel, 10000 kN press, automode GLO          '2591'      metals      other metals   kg   
#> 5 833caa78-30df-4374-900f-7f88ab44075b        13.6   iron-nickel-chromium alloy production                    RER          '2410'      metals      iron & steel   kg   
#> 6 833caa78-30df-4374-900f-7f88ab44075b        14.7   iron-nickel-chromium alloy production                    RER          '2410'      metals      iron & steel   kg   
#> 7 833caa78-30df-4374-900f-7f88ab44075b         0.390 market for deep drawing, steel, 10000 kN press, automode GLO          '2591'      metals      other metals   kg   
#> 8 833caa78-30df-4374-900f-7f88ab44075b        12.7   iron-nickel-chromium alloy production                    RER          '2410'      metals      iron & steel   kg
#> # A tibble: 2 × 1
#>   activity_uuid_product_uuid          
#>   <chr>                               
#> 1 833caa78-30df-4374-900f-7f88ab44075b
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb

companies <- read_csv(toy_emissions_profile_any_companies())
inputs <- read_csv(toy_emissions_profile_upstream_products_ecoinvent())

# *uuid in companies that match *uuid in inputs
left_join(companies, inputs, relationship = "many-to-many") |> 
  print() |> 
  distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name)`
#> # A tibble: 97 × 16
#>    activity_uuid_product_uuid           clustered                   companies_id                         country ei_activity_name                                              main_activity unit  ei_geography input_activity_uuid_product_uuid                                          input_co2_footprint input_ei_activity_name                                          input_isic_4digit input_reference_product_name                       input_tilt_sector input_tilt_subsector     input_unit
#>    <chr>                                <chr>                       <chr>                                <chr>   <chr>                                                         <chr>         <chr> <chr>        <chr>                                                                                   <dbl> <chr>                                                           <chr>             <chr>                                              <chr>             <chr>                    <chr>     
#>  1 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        soot_asianpiedstarling               germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2    RoW          bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df                240. shed construction, large, wood, non-insulated, fire-unprotected '4100'            shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#>  2 76269c17-78d6-420b-991a-aa38c51b45b7 table hire for parties      frightening_chrysomelid              spain   market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2    RoW          bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df                240. shed construction, large, wood, non-insulated, fire-unprotected '4100'            shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#>  3 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea                     germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg    <NA>         <NA>                                                                                      NA  <NA>                                                            <NA>              <NA>                                               <NA>              <NA>                     <NA>      
#>  4 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering         hyperbrutal_flea                     germany market for deep drawing, steel, 10000 kN press, automode      distributor   kg    <NA>         <NA>                                                                                      NA  <NA>                                                            <NA>              <NA>                                               <NA>              <NA>                     <NA>      
#>  5 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        flexible_dolphin                     austria market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2    RoW          bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df                240. shed construction, large, wood, non-insulated, fire-unprotected '4100'            shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#>  6 76269c17-78d6-420b-991a-aa38c51b45b7 tent                        paramilitary_racerunner              germany market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2    RoW          bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df                240. shed construction, large, wood, non-insulated, fire-unprotected '4100'            shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#>  7 76269c17-78d6-420b-991a-aa38c51b45b7 open space amenities        level_meadowhawk                     france  market for shed, large, wood, non-insulated, fire-unprotected wholesaler    m2    RoW          bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df                240. shed construction, large, wood, non-insulated, fire-unprotected '4100'            shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#>  8 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb tent                        heartrending_attwatersprairiechicken germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2    RoW          bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df                463. shed construction, large, wood, non-insulated, fire-unprotected '4100'            shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#>  9 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb tent                        heartrending_attwatersprairiechicken germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2    RoW          bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df                451. shed construction, large, wood, non-insulated, fire-unprotected '4100'            shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#> 10 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb tent                        heartrending_attwatersprairiechicken germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2    RoW          bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df                447. shed construction, large, wood, non-insulated, fire-unprotected '4100'            shed, large, wood, non-insulated, fire-unprotected construction      construction residential m2        
#> # ℹ 87 more rows
#> # A tibble: 3 × 1
#>   activity_uuid_product_uuid          
#>   <chr>                               
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7
#> 2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb
#> 3 833caa78-30df-4374-900f-7f88ab44075b

# *uuid in companies that do NOT match *uuid in inputs
anti_join(companies, inputs) |> 
  print() |> 
  distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name)`
#> # A tibble: 4 × 7
#>   activity_uuid_product_uuid           clustered                   companies_id      country ei_activity_name                                         main_activity         unit 
#>   <chr>                                <chr>                       <chr>             <chr>   <chr>                                                    <chr>                 <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7 surface finishing, galvanic hyperbrutal_flea  germany market for deep drawing, steel, 10000 kN press, automode distributor           kg   
#> 2 76269c17-78d6-420b-991a-aa38c51b45b7 surface engineering         hyperbrutal_flea  germany market for deep drawing, steel, 10000 kN press, automode distributor           kg   
#> 3 76269c17-78d6-420b-991a-aa38c51b45b7 deep-drawn metal part       humanoid_elkhound germany market for deep drawing, steel, 10000 kN press, automode agent/ representative kg   
#> 4 76269c17-78d6-420b-991a-aa38c51b45b7 drawn parts                 humanoid_elkhound germany market for deep drawing, steel, 10000 kN press, automode agent/ representative kg
#> # A tibble: 1 × 1
#>   activity_uuid_product_uuid          
#>   <chr>                               
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7

# *uuid in inputs that do NOT match *uuid in companies
anti_join(inputs, companies) |> 
  print() |> 
  distinct(activity_uuid_product_uuid)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name)`
#> # A tibble: 85 × 11
#>    activity_uuid_product_uuid           ei_activity_name                                         ei_geography                      input_activity_uuid_product_uuid                                          input_co2_footprint input_ei_activity_name                        input_isic_4digit input_reference_product_name                  input_tilt_sector input_tilt_subsector  input_unit
#>    <chr>                                <chr>                                                    <chr>                             <chr>                                                                                   <dbl> <chr>                                         <chr>             <chr>                                         <chr>             <chr>                 <chr>     
#>  1 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb market for deep drawing, steel, 10000 kN press, automode RoW                               55a5ac05-ab15-5a27-9d0e-6ecf840039f1_f10b8722-4be1-43d5-b17d-c51ad0e29d29              0.456  deep drawing, steel, 10000 kN press, automode '2591'            deep drawing, steel, 10000 kN press, automode metals            other metals          kg        
#>  2 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                    GLO                               bdc93cd8-00b4-5b3e-993e-b7fef7059e52_4e584f6f-2e71-4796-931e-bb9a273c161c              1.67   market for anode, for metal electrolysis      '2790'            anode, for metal electrolysis                 industry          machinery & equipment kg        
#>  3 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                    RER                               95fcd1bb-4dc6-516a-a3b2-30a4f0530639_3b1d249a-c924-4d6c-8e1f-647f562daa54              0.530  market for electric arc furnace dust          '3821'            electric arc furnace dust                     industry          other industry        kg        
#>  4 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                    RER                               daef2f9a-4108-52ae-90a7-fe64abad51bc_6e74937e-b691-4c49-9b8f-5ba44d7c081d              0.589  market for electric arc furnace slag          '3821'            electric arc furnace slag                     industry          other industry        kg        
#>  5 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                    RER                               3b190359-a32e-5294-af63-983f38ce6525_759b89bd-3aa6-42ad-b767-5bb9ef5d331d              0.602  market group for electricity, medium voltage  '3510'            electricity, medium voltage                   power             total power           kWh       
#>  6 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                    GLO                               2c92cdcd-29df-53ba-a209-77c7de201d14_6e316c64-0481-4832-b097-296e14c0b02f              7.32   market for ferrochromium, high-carbon, 68% Cr '2410'            ferrochromium, high-carbon, 68% Cr            metals            iron & steel          kg        
#>  7 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                    Europe, without Russia and Turkey 9392c694-12a6-5cd7-a421-d4866359df2c_0d3eda5a-4601-4573-9549-0701c459ab88              0.710  market for hard coal                          '0510'            hard coal                                     energy            coal energy           kg        
#>  8 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                    CH                                c18c6cc9-4a26-5c47-9ea9-8635ff2c158e_240c1a3c-1aba-4528-afc3-3f27f56583be              0.0106 market for inert waste, for final disposal    '3821'            inert waste, for final disposal               industry          other industry        kg        
#>  9 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                    RER                               c4ec0b1e-2a3b-5700-871c-2adbbb29bc1d_4f312355-ac65-4635-8fb2-006dba64ce60              0.0581 market for iron scrap, sorted, pressed        '3830'            iron scrap, sorted, pressed                   industry          other industry        kg        
#> 10 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb iron-nickel-chromium alloy production                    CH                                7361f7fb-5cf2-598c-823a-a4b7e50c3d28_a9007f10-7e39-4d50-8f4a-d6d03ce3d673              1.22   market for natural gas, high pressure         '3520'            natural gas, high pressure                    energy            gas energy            m3        
#> # ℹ 75 more rows
#> # A tibble: 2 × 1
#>   activity_uuid_product_uuid          
#>   <chr>                               
#> 1 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb
#> 2 833caa78-30df-4374-900f-7f88ab44075b

Created on 2024-01-09 with reprex v2.0.2

maurolepore commented 6 months ago

@Tilmon,

Resuming today's conversation about making sure that the toy data is public.

@kalashsinghal added a few columns that didn't exist in the old datasets (comparison between the columns in old versus new datasets). This means we'll publish some columns which privacy we didn't discuss before. In particular Kalash was worried about the *activity_name columns.

I look forward to documenting the details of licensed columns. But from what you said today I believe the toy datasets in this PR are OK (but see CAVEAT below) because among other features (see them all in the top comment) these datasets have:

The implementation of these features happen in the package tiltToyDataPrivate, via the function randimize_uuid() (source, application). If you explore its source code you'll see that it first replaces *uuid with fake values and then shuffles them (via sample()) to break the link between activity_uuid_product_uuid and other columns.

The following reprex shows the result. Focus on the relationship between the colum clustered and ei_activity_name. That relationship doesn't always make sense, giving evidence that the link is broken.

library(readr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
devtools::load_all()
#> ℹ Loading tiltToyData

options(readr.show_col_types = FALSE, width = 1000)

companies <- read_csv(toy_emissions_profile_any_companies())
products <- read_csv(toy_emissions_profile_products_ecoinvent())

left_join(companies, products, relationship = "many-to-many") |> 
  select(matches(c("uuid", "activity_name")), clustered)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name, unit)`
#> # A tibble: 155 × 3
#>    activity_uuid_product_uuid           ei_activity_name                                              clustered                  
#>    <chr>                                <chr>                                                         <chr>                      
#>  1 76269c17-78d6-420b-991a-aa38c51b45b7 market for shed, large, wood, non-insulated, fire-unprotected tent                       
#>  2 76269c17-78d6-420b-991a-aa38c51b45b7 market for shed, large, wood, non-insulated, fire-unprotected tent                       
#>  3 76269c17-78d6-420b-991a-aa38c51b45b7 market for shed, large, wood, non-insulated, fire-unprotected table hire for parties     
#>  4 76269c17-78d6-420b-991a-aa38c51b45b7 market for shed, large, wood, non-insulated, fire-unprotected table hire for parties     
#>  5 76269c17-78d6-420b-991a-aa38c51b45b7 market for deep drawing, steel, 10000 kN press, automode      surface finishing, galvanic
#>  6 76269c17-78d6-420b-991a-aa38c51b45b7 market for deep drawing, steel, 10000 kN press, automode      surface finishing, galvanic
#>  7 76269c17-78d6-420b-991a-aa38c51b45b7 market for deep drawing, steel, 10000 kN press, automode      surface finishing, galvanic
#>  8 76269c17-78d6-420b-991a-aa38c51b45b7 market for deep drawing, steel, 10000 kN press, automode      surface engineering        
#>  9 76269c17-78d6-420b-991a-aa38c51b45b7 market for deep drawing, steel, 10000 kN press, automode      surface engineering        
#> 10 76269c17-78d6-420b-991a-aa38c51b45b7 market for deep drawing, steel, 10000 kN press, automode      surface engineering        
#> # ℹ 145 more rows

inputs <- read_csv(toy_emissions_profile_upstream_products_ecoinvent())

left_join(companies, inputs, relationship = "many-to-many") |> 
  select(matches(c("uuid", "activity_name")), clustered)
#> Joining with `by = join_by(activity_uuid_product_uuid, ei_activity_name)`
#> # A tibble: 97 × 5
#>    activity_uuid_product_uuid           input_activity_uuid_product_uuid                                          ei_activity_name                                              input_ei_activity_name                                          clustered                  
#>    <chr>                                <chr>                                                                     <chr>                                                         <chr>                                                           <chr>                      
#>  1 76269c17-78d6-420b-991a-aa38c51b45b7 bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df market for shed, large, wood, non-insulated, fire-unprotected shed construction, large, wood, non-insulated, fire-unprotected tent                       
#>  2 76269c17-78d6-420b-991a-aa38c51b45b7 bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df market for shed, large, wood, non-insulated, fire-unprotected shed construction, large, wood, non-insulated, fire-unprotected table hire for parties     
#>  3 76269c17-78d6-420b-991a-aa38c51b45b7 <NA>                                                                      market for deep drawing, steel, 10000 kN press, automode      <NA>                                                            surface finishing, galvanic
#>  4 76269c17-78d6-420b-991a-aa38c51b45b7 <NA>                                                                      market for deep drawing, steel, 10000 kN press, automode      <NA>                                                            surface engineering        
#>  5 76269c17-78d6-420b-991a-aa38c51b45b7 bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df market for shed, large, wood, non-insulated, fire-unprotected shed construction, large, wood, non-insulated, fire-unprotected tent                       
#>  6 76269c17-78d6-420b-991a-aa38c51b45b7 bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df market for shed, large, wood, non-insulated, fire-unprotected shed construction, large, wood, non-insulated, fire-unprotected tent                       
#>  7 76269c17-78d6-420b-991a-aa38c51b45b7 bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df market for shed, large, wood, non-insulated, fire-unprotected shed construction, large, wood, non-insulated, fire-unprotected open space amenities       
#>  8 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df market for shed, large, wood, non-insulated, fire-unprotected shed construction, large, wood, non-insulated, fire-unprotected tent                       
#>  9 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df market for shed, large, wood, non-insulated, fire-unprotected shed construction, large, wood, non-insulated, fire-unprotected tent                       
#> 10 bf94b5a7-b7a2-46d1-bb95-84bc560b12fb bc548877-9cc6-590d-ba72-1d1d2daeb5b9_e2ccc500-255f-448c-8c88-ed25177993df market for shed, large, wood, non-insulated, fire-unprotected shed construction, large, wood, non-insulated, fire-unprotected tent                       
#> # ℹ 87 more rows

CAVEAT

The caveat is that the toy data is very small so the pool of *uuid that we shuffled is small too, meaning that it's not hard to re-arrange the broken link using just common sense.

If this worries you, we can discuss how to continue before we merge this PR. A quick alternative for now might be able to remove the additional columns if @kalashsinghal thinks we can live without them for now (I don't know how exactly they will be used).

kalashsinghal commented 6 months ago

If this worries you, we can discuss how to continue before we merge this PR. A quick alternative for now might be able to remove the additional columns if @kalashsinghal thinks we can live without them for now (I don't know how exactly they will be used).

@maurolepore Anne needed these additional columns in the past for her own analysis on the tiltIndicatorBefore outputs. Its not feasible to use any untraceable code to add these columns and then provide the output to Anne's analysis in the future. Hence, I highly recommend to have these columns in the final output of tiltIndicatorBefore.

Also, @maurolepore You can use any fake data to add these columns in the toy data irrespective of how these additional columns are linked to the licensed data. I am asking to do so because these additional columns are not used in tiltIndicator and tiltIndicatorAfter in any way, and also an external user don't need to see the real values of such columns. right @Tilmon? Please confirm this! Thanks! :)

AnneSchoenauer commented 6 months ago

@maurolepore - thanks a lot for taking already the initiatve with seeing if there are some uuids not in one dataset but in the ohter. I created before I saw your comment here this ticket with a reprex to show you why this requirement is so important. Maybe it also helps you. You can close it if it is all clear.

For your questions to @Tilmon - wasn't this the reason why we now have the jitter function as well for the co2 data? However, yes most likely not covering all licensed data. Good to make here a proper check before publishing .

maurolepore commented 6 months ago

@maurolepore Anne needed these additional columns in the past for her own analysis on the tiltIndicatorBefore outputs. ... these additional columns are not used in tiltIndicator and tiltIndicatorAfter in any way

Thanks @kalashsinghal. The usage of those columns seems internal -- equivalent to a developer-oriented internal function that users don't need to know about.

@AnneSchoenauer to refresh your memory, this PR proposes to introduce the following new columns in public toy datasets (for details see also Comparing columns between old and new datasets above):

# emissions_profile_any_companies
#> [1] "country"          "ei_activity_name" "main_activity"

# emissions_profile_products_ecoinvent
#> [1] "ei_geography"

# emissions_profile_upstream_products_ecoinvent
#> [1] "ei_activity_name"             "ei_geography"                
#> [3] "input_ei_activity_name"       "input_reference_product_name"

Do you need these columns to appear in the public toy datasets that we use in all our websites to shows examples to our users?

From what Kalash says they are not useful for most users of our packages. We can keep them and ensure they don't expose private data, but it's best to make public as little as possible. It's always best to release as little stuff as possible. Adding things later is easy. Removing them later is hard -- we need to go through an expensive deprecation process to ensure backward compatibility.

BTW, thanks for that ticket. Sorry I missed it. I'll continue that conversation there.

AnneSchoenauer commented 5 months ago

@Tilmon what do you think? These data points are not needed for the tiltIndicator but are needed for the output files. So they are needed to produce the outputs from tiltIndicatorAfter. I hear @maurolepore though that it is expensive to have them in in the toydataset. @Tilmon what do you think is better from a transparency and usability perspective?

maurolepore commented 5 months ago

@AnneSchoenauer

These data points ... are needed for the output files

Okay, then this suggests a more public usage than what I understand from Kalash's comments. If users expect those columns in the output then they seem to deserve a place in the toy datasets. And tiltIndicatorAfter should have a test to ensure these columns exist in the output (cc' @kalashsinghal).

Assuming the columns stay, then the last question we need to resolve is this:

See https://github.com/2DegreesInvesting/tiltToyData/pull/19#issuecomment-1883788387

I look forward to your answers so we can merge this PR ASAP and and close the many related issues.

AnneSchoenauer commented 5 months ago

Great!

@Tilmon could you please answer this here. I think you are best invovled with the licenses issues in ecoinvent.

Thanks!!

kalashsinghal commented 5 months ago

These data points are not needed for the tiltIndicator but are needed for the output files. So they are needed to produce the outputs from tiltIndicatorAfter.

@AnneSchoenauer Based on your comment here, Should I ensure that all these extra columns from tiltIndicatorBefore also be present in the final output from tiltIndicatorAfter? If yes, then I will create a separate ticket for it in tiltIndicatorAfter package. FYI: At the moment not all of those extra columns are present in the final output.

AnneSchoenauer commented 5 months ago

@kalashsinghal yes sounds good!

Tilmon commented 5 months ago

Hi @AnneSchoenauer @kalashsinghal @maurolepore sorry for the late response and thanks for already clarifying almost everything.

Re

I look forward to documenting the details of licensed columns. But from what you said today I believe the toy datasets in this PR are OK (but see CAVEAT below) because among other features (see them all in the top comment) these datasets have:

  • Fake activity_uuid_product_uuid.
  • Random mapping between fake activity_uuid_product_uuid and other columns.

That's fine! It's important that we don't share the real co2 data, which we don't, because it's jittered. Also it's important to not share which input_activity_uuid_product_uuidbelong to which activity_uuid_product_uuid, which is not the case because:

Note for future reference (will also send this in an email as discussed to the whole team): the activity_uuid_product_uuidis defined by the combination activity_name x reference_productx geographyx main_activity. Meaning that if you fake the activity_uuid_product_uuid but show all other four columns users can create the activity_uuid_product_uuidby themselves. In the toy data that you referenced, not all 4 columns aside from the activity_uuid_product_uuid are shown, so all is fine.

@maurolepore you pointed out that the toy data is small and one could try to re-shuffle it to get licensed information but they won't be able to define the exact activity_uuid_product_uuid as too many of the other 4 variables are missing.

Thanks for your efforts to create the toydata in a compliant way with our license agreement @maurolepore and @kalashsinghal !

AnneSchoenauer commented 5 months ago

@Tilmon thanks a lot. Very clear :) You are right the information is not shown in the toydata set. But in our output (so in the very end) people will get results with the toy data set (using tiltWorkflows) which will contain all the four data points that you mentioned right?

@maurolepore do you see this as a problem!

maurolepore commented 5 months ago

@AnneSchoenauer, I think the best person to answer is @Tilmon. But if the final output will have all the columns and the shuffling doesn't break licensed links, then it seems like a problem. If so let me know which columns we could replace with fake values.

AnneSchoenauer commented 5 months ago

Yes I agree - @Tilmon we can also discuss later but I think it will be a problem indeed.

Tilmon commented 5 months ago

@AnneSchoenauer good point. I checked the tiltIndicator and tiltIndicatorAfter outputs and what I see is that in

What do you think @AnneSchoenauer @maurolepore? Does it make sense to create fake IDs for the users and keep the real IDs for ourselves?

AnneSchoenauer commented 5 months ago

Yes it does! @maurolepore do you think you could do this?

We are getting there!!!

kalashsinghal commented 5 months ago

@Tilmon I would like to add that columns ei_activity_name and ei_input_geography will also be added to the final output of emission profile upstream indicator at product level. Can they cause any issue?

maurolepore commented 5 months ago

Dear all,

Although it seems like this conversation still needs to continue to make our toy datasets perfect, I go with the saying "perfect is the enemy of the good". This PR already represents a significant improvement and if anything private is exposed, it is less than before.

The condition to merge a PR is that (1) it doesn't make things worse and (2) it does make things better. So I went ahead and merged this PR. This allows us to move on with the very many related issues that depend on this one.

In short, as the top comment says the main features of this PR are these:

We can now start from good and more to perfect here: #24