Schema class `to_ref` method poorly captures metadata of its reference class

dt-woods commented 4 months ago

When examining the metadata in JSON-LD, the "reference" metadata is richer than that provider by this Pythonic schema's to_ref method conveniently provided with each root entity.

For example, the JSON-LD information in a ProcessLink within a Product System includes a reference to an exchange, flow, process, and provider such as the following (copied and pasted after reading a JSON-LD file exported from openLCA v2):

{'exchange': {'internalId': 2},
 'flow': {'@type': 'Flow',
  '@id': '684e90ad-87cd-3f96-aa4b-ca45d2df37dd',
  'category': 'Technosphere Flows/22: Utilities/2212: Natural Gas Distribution',
  'flowType': 'PRODUCT_FLOW',
  'name': 'natural gas, through transmission',
  'refUnit': 'MJ'},
 'process': {'@type': 'Process',
  '@id': '9ffb3357-2438-336d-b35b-ea45d90793b8',
  'category': '22: Utilities/2211: Electric Power Generation, Transmission and Distribution/GAS',
  'flowType': 'PRODUCT_FLOW',
  'name': 'Electricity - GAS - NorthWestern Corporation',
  'processType': 'UNIT_PROCESS'},
 'provider': {'@type': 'Process',
  '@id': '5d2c8c18-5a12-39b1-8c45-376615518788',
  'category': '22: Utilities/2212: Natural Gas Distribution',
  'flowType': 'PRODUCT_FLOW',
  'name': 'natural gas extraction and processing - Green River',
  'processType': 'LCI_RESULT'}}

This makes me think that I could query a process that I want and convert it to a Ref object to meet this metadata standard (after all, that's what the documentation says to do for the 'process' attribute for a ProcessLink). When I do so, this is what happens:

>>> p_obj = f.read(o.Process, "9ffb3357-2438-336d-b35b-ea45d90793b8").to_ref()
Ref(
    id='845f2dd8-e0a4-3b7e-8a28-e55fbd1be2c4', 
    category='22: Utilities/2211: Electric Power Generation, Transmission and Distribution/GAS', 
    description=None,   # MISSING
    flow_type=None,     # MISSING
    location=None,      # MISSING
    name='Electricity - GAS - NorthWestern Corporation', 
    process_type=None,  # MISSING
    ref_unit=None,      # MISSING
    ref_type=<RefType.Process: 'Process'>,
)

Where'd all the metadata go?

For clarity:

Why is so much of the metadata removed from the Ref object?
Why is this metadata available in the JSON-LD, but not handled here?
Is this even important when it comes to importing JSON-LD to an openLCA database?

All this questioning is roused from the poor handling of product system creation (see olca-ipc Issue #31).

msrocka commented 4 months ago

Why is so much of the metadata removed from the Ref object?

For the import or communication with openLCA only the @type and @id fields are required to identify the referenced dataset, in most cases even only the @id field because there is often only one type allowed (exceptions are calculation targets and process links). The other fields are just there to make these references human-readable and to display these metadata in user interfaces like the Collaboration Server.

The to_ref method is automatically generated from the schema definition for different types and thus only includes some common metadata. The code generator could be extended to include more type specific metadata but I currently do not see a use-case because you already have the full object when calling to_ref.

dt-woods commented 4 months ago

My use case is generating JSON-LD files to be read by openLCA. It sounds like the important info is in the auto-generated reference objects. I'm guessing openLCA can fill in the missing details once imported into the database.

On Wed, Feb 28, 2024, 02:57 Michael Srocka @.***> wrote:

Why is so much of the metadata removed from the Ref object?

For the import or communication with openLCA only the @type and @id fields are required to identify the referenced dataset, in most cases even only the @id field because there is often only one type allowed (exceptions are calculation targets and process links). The other fields are just there to make these references human-readable and to display these metadata in user interfaces like the Collaboration Server.

The to_ref method is automatically generated from the schema definition for different types and thus only includes some common metadata. The code generator could be extended to include more type specific metadata but I currently do not see a use-case because you already have the full object when calling to_ref.

— Reply to this email directly, view it on GitHub https://github.com/GreenDelta/olca-schema/issues/8#issuecomment-1968420837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHCFB5JYEEZZMTKMLZVO63TYV3PNXAVCNFSM6AAAAABD456ANOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRYGQZDAOBTG4 . You are receiving this because you authored the thread.Message ID: @.***>

dt-woods commented 4 months ago

How important is the @context tag in the JSON files?

I'm noticing that JSON files have this metadata when exported from openLCA (i.e., the value set to http://greendelta.github.io/olca-schema/context.jsonld), but it's not there when I use olca-schema to write to JSON-LD.

GreenDelta / olca-schema

Schema class `to_ref` method poorly captures metadata of its reference class #8