Closed fjuniorr closed 7 months ago
@cmutel it would be great to hear if you have any preferences in this case.
For reporting this makes sense and the proposed hash is fine. I really like the current approach without normalization, as it is much clearer and easier to communicate how to use the generated mappings.
I am really reluctant to add a field to the produced mapping which is not in the source data, even if it is reproducible. Everyone using the mapping lists will need to special case the logic for that field, and it is so much easier to just test if all provided fields match. Moreover, I think some people could assume that an id
field is definitive and only try to match against that (and not find anything).
So I would prefer to make this a configurable option with a negative default.
I've used the generate_flow_id
and I agree that the id
should be left out.
I will track the option https://github.com/fjuniorr/flowmapper/issues/29 and close this to focus on the new match strategies.
For reporting purposes (such as #25 and #26) is very useful to have a unique id for a flow even if it's not present in the original flow list.
My idea is to add a new field
id
to theFlow
class that would be populated either with the provided uuid or with a generated one.There are at least two relevant implementation decisions.
algorithm
My initial idea would be to do something like:
Because at least for now I didn't see the need to identify flows across flowlists without id and therefore their properties in principle don't need to be normalized.
Some other notable options would be (both do some format of normalization)
bw2io.utils.activity_hash
esupy.util.make_uuid
generated id in results
Take for example this two matching flows:
Still following
randonneur
data migration format we would not add the generated id to thesource
otherwiserandonneur.utils.matcher
would not match (ie. it does not exist in the source dict):For reference, in OpenLCA flow mapping file the
FlowMapEntry
adds the generated id if one one UUID is not provided[^20231126T204229][^20231126T204229]: Generated with