Explicit list of subcategories as "components" attribute

IAMconsortium / common-definitions

Repository for definitions and mappings in model comparison projects

Creative Commons Zero v1.0 Universal

9 stars 18 forks source link

Explicit list of subcategories as "components" attribute #29

Open danielhuppmann opened 9 months ago

danielhuppmann commented 9 months ago

As a follow-up to #16, we should add explicit "components" attributes to secondary-energy variables, e.g., which energy-carriers are part of non-biomass renewables to facilitate automated validation and consistency checks.

fyi @orichters @phackstock

orichters commented 9 months ago

Not only Secondary Energy. This is what we use for to check REMIND submissions: AR6, NAVIGATE
Note that we best implement it that multiple summation groups can exist, for example:
- Population = Population|Female + Population|Male
- Population = Population|Rural + Population|Urban

khaeru commented 6 months ago

This is also common practice by organizations that provide SDMX—for instance, from the IMF:

>>> import sdmx
>>> IMF = sdmx.Client("IMF")
>>> msg = IMF.codelist("CL_AREA")
>>> cl = msg.codelist["CL_AREA"]
>>> cl
<Codelist IMF:CL_AREA(1.15) (901 items): Area code list>
>>> c = cl["A2A3"]
>>> c
<Code A2A3: North and Central American countries (CDIS)>
>>> c.description
en: A2A3 = BZ + CA + CR + SV + GT + HN + MX + NI + PA + US + A2A39

It seems common in the wild that this is a line in the description, usually the last line; but I think it would be easier to handle and parse if it were a separate annotation.

Per @orichters example, since it's very common to have spaces in IAMC variable names, some form of quoting should be allowed or required.

orichters commented 6 months ago

Note that you may also end up with more complicated "summations".

Emissions|Kyoto Gases = Emissions|F-Gases + 0.265 * Emissions|N2O + 28 * Emissions|CH4 + Emissions|CO2

Might be worth considering when setting up the structure.

danielhuppmann commented 6 months ago

Thank you for your comments.

In order to keep a simple codebase, I strongly suggest that we keep close to standard yaml syntax to avoid parsing where possible. Having a variable

Population:
    components: [Population|Female, Population|Male]

or (for longer lists)

Population:
    components: 
        - Population|Female
        - Population|Male

is just as readable as a string separated by special characters.

Also, this way, the arguments can be directly passed to the pyam methods that will do the processing internally, e.g., IamDataFrame.aggregate().

For more complex operations beyond sum, min, max or weighted average, I suggest to have a dedicated Processor subclass in the nomenclature package - after all, the Kyoto-GHG-aggregation will require configuration like which emissions are required, which GWP to use, etc. Let's please discuss this as a separate (new) issue in the nomenclature repository.