Closed cmutel closed 6 months ago
When there is no context in source
and target
, we still need equal context for a match, correct? But when you add a context to either source
or target
I should disregard flowmapper
current context mapping and check the values provided?
Ps. So far this is the only entry that has a categories
key (is this missing a name in target
?):
{
"target": {
"categories": [
"Emissions to air",
"low. pop., long-term"
]
},
"source": {
"name": "Cesium-134",
"categories": [
"Emissions to air",
"low. pop."
],
"unit": "kBq"
}
}
The cesium example is correct.
These are data transformations, ie they should be applied to the Flow object in order to get a match.
The model I had in mind is as long as every element in the source matches you apply the transformation in target. Therefore we don't need a context in cases with the transformation should be applied for every possible context is it
These are data transformations, ie they should be applied to the Flow object in order to get a match.
I had not fully understood this until now, but it does make a lot of sense.
I've added a first working version in https://github.com/fjuniorr/flowmapper/pull/70/commits/c6481f0f24ebfd0e3804754af24ca5c16f75dcae, but there are some changes that need to be made:
Flow
, not the Unit
)flowmapper map
CLI command as a flag --transformations
. From within python we need to do the transformations manually[^1].[^1]: Something like:
```python
from flowmapper.utils import read_field_mapping, read_flowlist, read_migration_file
from flowmapper.flow import Flow
from flowmapper.flowmap import Flowmap
from randonneur import migrate_datasets
fields = read_field_mapping('config/simapro-ecoinvent.py')
source_flows = read_flowlist('data/agribalyse-3.1.1-biosphere.json')
migration_spec = read_migration_file('config/sp-formatted.json')
migrate_datasets(migration_spec, source_flows)
source_flows = [Flow.from_dict(flow, fields['source']) for flow in source_flows]
target_flows = [Flow.from_dict(flow, fields['target']) for flow in read_flowlist('data/ecoinvent-3.7-biosphere.json')]
flowmap = Flowmap(source_flows, target_flows)
flowmap.statistics()
```
@cmutel could you confirm my understanding that if we have a source flow such as
{
"name": "Transformation, to water courses, artificial",
"unit": "m2",
"categories": [
"Resources",
"land"
]
}
and the following transformation to be applied:
{
"source": {
"name": "Transformation, to water bodies, artificial"
},
"target": {
"name": "Transformation, to river, artificial"
}
}
The output of Flowmap.to_randonneur
should still be (ie. the name
continues to be "Transformation, to water courses, artificial" and not "Transformation, to river, artificial"):
{
"source": {
"name": "Transformation, to water courses, artificial",
"categories": [
"Resources",
"land"
],
"unit": "m2"
},
"target": {
"uuid": "090e9aa9-a9a9-4878-9634-3ad0ba7fbc91",
"name": "Transformation, to river, artificial",
"context": "natural resource/land",
"unit": "m2"
},
"conversion_factor": 1.0,
"comment": "Minor land name differences"
},
I think this will be a somewhat bigger change that's why I'm making sure.
Yes, exactly.
We have a Flow
object with the original data stored in raw
. We apply transformations (not mapping) for normalizing the raw data, and for things which don't fit into our normal mapping functions. For example, we will have cases which need to be mapped manually. But the resulting output is a mapping, not a transformation, from the original source data to the original target data.
I'm changing the source flows with randonneur before they are initialized, which means I loose access to the original flows and can't write the proper output files (only realized how big a problem this is while writing)
I still need to add a couple more tests and cleanup the code but this is working as expected after https://github.com/fjuniorr/flowmapper/pull/70/commits/6b955a29a718e6a6e5128af474e449e9228679e1 in https://github.com/fjuniorr/flowmapper/pull/70. A call from the CLI with multiple data migration files looks like:
flowmapper map data/agribalyse-3.1.1-biosphere.json data/ecoinvent-3.7-biosphere.json \
--fields config/simapro-ecoinvent.py \
-t config/transformations.json \
-t config/sp-formatted.json
Two questions @cmutel:
CONTEXT_MAPPING
, UNITS_NORMALIZATION
and ECOINVENT_UUID_39_310_MAPPING
) are? Do you see us moving away from them as well in favor of randonneur data migration files?Can I already remove all the dicts with name differences mappings and we eventually catch up with what is missing in the data migration files?
Sure, I can pick up on the things I missed from source control or the original files I sent you.
Can you evaluate how much of a problem the other hard-coded constants (CONTEXT_MAPPING, UNITS_NORMALIZATION and ECOINVENT_UUID_39_310_MAPPING) are? Do you see us moving away from them as well in favor of randonneur data migration files?
CONTEXT_MAPPING
is specific to SimaPro, so this should be configurable. But I think we can leave this as a builtin.UNITS_NORMALIZATION
is pretty generic. Leave for now.ECOINVENT_UUID_39_310_MAPPING
is very specific - we will need one of these for 3.6, 3.7, 3.8, etc. Should be configurable.@fjuniorr Here is an more complete mapping file constructed manually
The values given in https://github.com/fjuniorr/flowmapper/blob/main/flowmapper/constants.py are not universal, nor are they provided in a form which could allow for more specific matches (i.e. change name but only in a specific context or with a specific unit). These matches are also specific to going from SimaPro to ecoinvent, but these are not the only two systems we use.
sp-formatted.json
The attached is a different approach - one that should have tooling already used to it's format and data model, and I would like to switch from the hard-coded values and systems to an option in the
flowmapper-ci
config and a data file like the one attached.I will iterate on this file once it is added. It can currently replace the
MINOR_LAND_NAME_DIFFERENCES_MAPPING
, and should eventually replaceRANDOM_NAME_DIFFERENCES_MAPPING
,NAME_DIFFERENCES_WITH_UNIT_CONVERSION_MAPPING
, andMISSING_FOSSIL_AND_BIOGENIC_CARBON_MAPPING
.