Closed maurolepore closed 5 months ago
@Tilmon is it fine if you take this as you know better the licenses issues - however let me know if I should give it a trial! :)
Hi @maurolepore , thanks for this.
I see following 2 features that are not covered yet, in regards to the two upstream indicators:
ei_input_geography
should have fake values. As @kalashsinghal mentioned here, he's planning on adding the column ei_input_geography
to the outputs. To avoid the risk that that enables users to identify the unique input, I suggest we create fake geography values. Probably best to make it obvious that they are fake, so what about "tiltland" or "low carbon wonderland"?input_activity_uuid_product_uuid
should be a fake value, so that users can't identify a unique link between an activity and an input (this link is licensed information). In this comment I even suggested, to create an internal mapper between real input_activity_uuid_product_uuid
and fake IDs which is not public-facing, but now i realize that if we only talk about the toy data, this is not even necessary, as the toy data should look realistic but there is no need that they contain actual "truth" of the real data, right? Is that clear enough / does that make sense?
Thanks!
input_activity_uuid_product_uuid
can be reproduced from these columns -- which therefore can't be shared together.
ei_activity_name
ei_reference_name
main_activity
geography
@maurolepore please correct to
activity_uuid_product_uuid can be reproduced from these columns -- which therefore can't be shared together.
ei_activity_name ei_reference_name main_activity geography
and
input_activity_uuid_product_uuid can be reproduced from these columns -- which therefore can't be shared together.
ei_input_activity_name ei_input_reference_name input_main_activity input_geography
cc' @kalashsinghal
@maurolepore Renaming the columns:
For activity_uuid_product_uuid:
ei_activity_name reference_product_name main_activity product_geography
and
For input_activity_uuid_product_uuid:
input_ei_activity_name input_reference_product_name main_activity input_geography
cc' @Tilmon
Hi @kalashsinghal I just checked tiltIndicatorAfter for profile_emissions_upstream and the variable is called: matched_reference_product
*That's the output name, not sure how it's called in your input data. But hopefully helps to identify the right column?
@Tilmon My bad. It's called reference_product_name
. I have updated my comment here: https://github.com/2DegreesInvesting/tiltToyData/issues/24#issuecomment-1895546670
Hi @maurolepore I think we can close this issue now, right? We decided that we will use the toyData the DT developes for tiltIndicatorBefore. The idea would be to then use the output of tiltIndicatorBefore based on the toyData for the tiltIndicator package etc.
We recently improved the toy datasets for emissions profile (#19). However there seem to be still some details to improve (https://github.com/2DegreesInvesting/tiltToyData/pull/19#issuecomment-1889276566).
@AnneSchoenauer and @Tilmon, please share your list of required features (for inspiration see the "Features" list in this item of the changelog. I'll add your list here as a check-box tasklist and create separate issues to address each request.
From conversations in 2DegreesInvesting/tiltToyData#19 I see that we spent a lot of effort trying to ensure the privacy of licensed data. That effort is only necessary if we base our toy datasets in real data. While realism might be valuable it seems important to weight if it's worth the risk of exposing private data, and worth the kind of effort we put in 2DegreesInvesting/tiltToyData#19. If we can indeed sacrifice realism in some sensitive columns, then we may simply populate them with totally fake values.
toy_emissions_profile*
Datasets:
toy_emissions_profile_any_companies()
.toy_emissions_profile_products_ecoinvent()
.toy_emissions_profile_upstream_products_ecoinvent()
.Features:
companies_id
(#19).activity_uuid_product_uuid
(#19).activity_uuid_product_uuid
and other columns (#19).*co2_footprint
, jittered to the right by 50%-100% on average (#19).activity_uuid_product_uuid
in*companies
also exist inco2
(#25).activity_uuid_product_uuid
inco2
don't exist in*companies
(#25).activity_uuid_product_uuid
in*companies
also exist inco2
(#25).toy_sector_profile*