IAMconsortium / common-definitions

Repository for definitions and mappings in model comparison projects
Creative Commons Zero v1.0 Universal
9 stars 18 forks source link

Backwards-compatibility of variable names #46

Open khaeru opened 6 months ago

khaeru commented 6 months ago

At the SWG meeting on 2023-12-06, Masa Sugiyama and others raised the idea of how to support backward-compatibility if it becomes necessary to change a variable name.

This issue is to discuss/collect ideas.

khaeru commented 6 months ago

My suggestion:

A minimal working example (MWE) using SDMX: ```py import sdmx import sdmx.model.v21 as m # Create a Code whose ID is a current variable name c = m.Code(id="Final Energy|Foo|Bar") # Create an annotation containing old/superseded variable names ann = m.Annotation( id="iamc-variable-old", text="\n".join( ["Final Energy|Bar|Foo", "Final Energy|Foo Bar"], ) ) c.annotations.append(ann) # Write to file cl = m.Codelist(id="VARIABLE", name="IAMC variable name") cl.append(c) msg = sdmx.message.StructureMessage() msg.add(cl) with open("example.xml", "wb") as f: f.write(sdmx.to_xml(msg, pretty_print=True)) ``` This gives output like: ```xml … Final Energy|Bar|Foo Final Energy|Foo Bar ``` And can be read and used like: ```py # Read the file, retrieve the codelist >>> msg = sdmx.read_sdmx("example.xml") >>> cl = msg.codelist["VARIABLE"] # Retrieve a specific variable name >>> c = cl["Final Energy|Foo|Bar"] >>> c # Retrieve the list of old names from the annotation >>> c.eval_annotation("iamc-variable-old").split("\n") ['Final Energy|Bar|Foo', 'Final Energy|Foo Bar'] ```
christophbertram commented 6 months ago

Do I understand it right that you say we can in principle add as many entries as we want? The old examples of the ENGAGE and NAVIGATE template only seem to have the entries "description" and "unit", but you say we could also add extra entries for storing the 'old' name. And then similarly, we could also create extra entries to denote maximum and minimum allowed per-capita values, and aliases with other data structures (e.g. the iTEM transport variable names or similar).

khaeru commented 6 months ago

@christophbertram I say we should agree on as many common annotations as we need, and that doing so is a feature of the SDMX standard (and supported by tools that implement it). What I don't know is whether the nomenclature tool that @phackstock and @danielhuppmann have developed supports access and use of such annotations: I only know we can put such entries in YAML files such as appear in this repo and they will be tolerated by nomenclature, i.e. it won't error when trying to read the files.

Per full-resolution keys: yes, exactly. I hope we can provide a proof-of-concept when linking the iTEM structure info to this repo.

Per "minimum and maximum allowed values per capita"—I think that is actually data, not structure. You can imagine an IAMC-structured table (or with fewer or more dimensions, e.g. possibly without YEAR or REGION) in which the numbers are not "actual observed historical values" nor "model-projection values" but "expected {minimum,maximum} per capita values". One could imagine having different sets of such values for different purposes, even when the same variable names are used.

danielhuppmann commented 6 months ago

Thanks for raising this issue, see a few comments below. Let's please try to keep issues and discussions narrow and start new issues where possible.

Cross-reference to legacy variables/regions or other standards: this is already implemented in a simple example here, see https://github.com/IAMconsortium/common-definitions/blob/3f530e2a37a649dfc011cbf5e6696d1da2e2cdae/definitions/variable/energy/final-energy.yaml#L115 and the value can be accessed from the nomenclature.DataStructureDefinition as

dsd.variable["Final Energy|Carbon Removal|Direct Air Capture|Electricity"].navigate

If you have specific suggestions for feature-support in nomenclature, e.g. as a "known" attribute with dedicated documentation, please start an issue there.

Validation of values should indeed be handled as a separate use-case and will be implemented similar to the required-data feature in nomenclature, see here. This PR https://github.com/IAMconsortium/pyam/pull/804 is a step towards support for that feature. The main reason for keeping this separate is that different projects may want to use different reference data or validation thresholds.

FlorianLeblancDr commented 4 months ago

I think this is partly fixed by yesterday's Daniel commit #PR61