Duplicate matches for same object

cmutel commented 6 months ago

Note: Using branch "randonneur-transformations"

Using the attached fixtures, and running:

flowmapper map \
    --fields config/simapro-ecoinvent.py \
    --output-dir dev \
    a.json \
    b.json

Leads to the following mapping file:

[
  {
    "source": {
      "name": "Sulfate",
      "categories": [
        "Emissions to water",
        ""
      ],
      "unit": "kg"
    },
    "target": {
      "uuid": "17a7db69-d874-4ecd-af5d-48c98def445f",
      "name": "Barite",
      "context": "water/unspecified",
      "unit": "kg"
    },
    "conversion_factor": 1.0,
    "comment": "Identical synonyms"
  },
  {
    "source": {
      "name": "Sulfate",
      "categories": [
        "Emissions to water",
        ""
      ],
      "unit": "kg"
    },
    "target": {
      "uuid": "28bca51a-6cc7-46af-961a-fd2b675a1376",
      "name": "Sulfate",
      "context": "water/unspecified",
      "unit": "kg"
    },
    "conversion_factor": 1.0,
    "comment": "Identical names"
  }
]

However, a.json only had one input record. This should be matched only to the first strategy which produced a match.

The synonym match is based on a data error, "sulfate" is not a real synonym for "barite".

a.json b.json

fjuniorr commented 6 months ago

This behaviour is on main as well.

After finding a match and breaking out of the rules loop, the code does not stop processing the current source flow. It continues to compare this source flow with the remaining target flows. This is why we are getting more than one match for a single source flow.

Should we not allow 1:N mappings at all? I need to keep the for loop as is because we don't know the order of the target flows, but I can remove mappings at the end and keep only the mapping produced by the match rule with higher priority.

fjuniorr commented 6 months ago

This is fixed on main and randonneur-transformations.

cmutel commented 6 months ago

Should we not allow 1:N mappings at all?

You always get to the heart of the matter... this is a tough question, and my answer is based on my experience on how I have personally and seen others use these mappings. In those cases, a single match is desired.

In the future we could use the randonneur verb disaggregate, if needed.

fjuniorr commented 6 months ago

https://github.com/GreenDelta/olca-app/issues/102 has some useful tidbits for this discussion as well.

fjuniorr / flowmapper

Duplicate matches for same object #71