Remove duplicated flows from flow lists before matching process

fjuniorr / flowmapper

Mappings between elementary flows

MIT License

0 stars 1 forks source link

Remove duplicated flows from flow lists before matching process #66

Closed fjuniorr closed 6 months ago

fjuniorr commented 6 months ago

With duplicated flows (especially in the source flows from simapro) we will get N:M cardinalities that are spurious. One example from sp-biosphere-1.json is

4650 source flows...
4329 target flows...
4046 mappings (86.19% of total).
Mappings cardinalities: {'N:M': 4046}

We are better off removing the duplicated flows from the matching process and showing this information in the statistics.

cmutel commented 6 months ago

Pretty sure this should be done to all raw input data. I don't think we lose anything, and it is helpful when doing the last manual debugging.