fjuniorr / flowmapper

Mappings between elementary flows
MIT License
0 stars 1 forks source link

Add `nomatch` argument to `Flowmap` for better statistics on unmatchable flows #50

Open fjuniorr opened 6 months ago

fjuniorr commented 6 months ago

[cmutel] I am wondering if we should have a list of known "no good match" objects, to get better statistics. Not sure about the data format exactly, but this would help focus on the outstanding matches which are possible.

[fjuniorr] Do you have some examples of "no good match" flows? Are they present in both source and target flow lists? One idea would be to add a nomatch argument to Flowmap that receive a list of callables that can filter out flows from the matching process.

fjuniorr commented 6 months ago

@cmutel I'm not sure this is the way to go but I've added a first try in https://github.com/fjuniorr/flowmapper/commit/94042a7c09c38371c47222c6e2d775312dd1ce6f and we can iterate from there. If you could share same actual flows you want to exclude I think it can help with the design.

Here's what it looks like for now:

from flowmapper.utils import read_field_mapping, read_flowlist
from flowmapper.flowmap import Flowmap
from flowmapper.flow import Flow

fields = read_field_mapping('config/simapro-ecoinvent.py')
source_flows = [Flow.from_dict(flow, fields['source']) for flow in read_flowlist('data/agribalyse-3.1.1-biosphere.json')]
target_flows = [Flow.from_dict(flow, fields['target']) for flow in read_flowlist('data/ecoinvent-3.7-biosphere.json')]

nomatch_water = lambda flow: flow.name.value.startswith('water')
nomatch_methyl = lambda flow: flow.name == "2-methyl pentane"

flowmap = Flowmap(source_flows, target_flows, nomatch_rules=[nomatch_water, nomatch_methyl])

flowmap.mappings
flowmap.statistics()
4724 source flows (943 excluded)...
4302 target flows (27 excluded)...
3013 mappings (63.02% of total).
fjuniorr commented 6 months ago

@cmutel for 0.1 do you think nomatch needs to be available form the CLI as well?

cmutel commented 6 months ago

@cmutel for 0.1 do you think nomatch needs to be available form the CLI as well?

No, not from the CLI. Could be part of the config file though.

fjuniorr commented 6 months ago

I think we are on the same page. I was thinking in being able to pass a data migration file (used only for matching) from the CLI so that the matching flows would be excluded. Something like:

flowmapper map --nomatch-source match.json

This kind of ties back to https://github.com/brightway-lca/randonneur/issues/10.

However after reading https://github.com/fjuniorr/flowmapper/issues/73 maybe the delete and create are all what we really need to give transparency about the matching output and we could drop nomatch entirely?