IAMconsortium / common-definitions

Repository for definitions and mappings in model comparison projects
Creative Commons Zero v1.0 Universal
9 stars 18 forks source link

Units/methods/operations-specifications in the variable name #55

Closed danielhuppmann closed 3 months ago

danielhuppmann commented 4 months ago

In previous projects, the variable names sometimes included specifications of units, methods or operations, e.g.

This is an abuse of the hierarchical structure and it makes variable-names difficult to read (for humans) and complicates automated processing (for machines).

I therefore suggest to change the convention to include a unit/method/operations identifier in brackets instead of appending it as a sub-sub-[sub-]category (and remove the identifier altogether where it is not necessary).

According to this proposal, the variables should then read as follows:

@IAMconsortium/common-definitions-coordination

FlorianLeblancDr commented 4 months ago

I like your proposal, at least this is better than using the hierarchical structure. A few comments :

Volume Should we consider that by default the variable should be understood as volumes? Most variables are currently in volume in the database, and this will avoid renaming if later "Index" or "Share" is introduced.

Index Definitely useful when models do not agree on the unit (for example the activity of freight is measured in ton/km or in pseudo-$ depending of the model).

Share I wonder if having "Shares" variables is not misleading, as it let each modelling team decide what is the denominator. Isn't it equivalent to request the variable you want to compute shares?

For example I have the employment variable. Instead of requesting employment as a share of something (something being specified in the definition), I could only request (for example) total active population and let the share be computed elsewhere?

Example:

- Unemployment:
    definition: Number of unemployed inhabitants (based on ILO classification)
    unit: million
- Unemployment|Rate:
    definition: Fraction of unemployed inhabitants (based on ILO classification)
    unit: '%'

To be replaced by:

- Unemployment:
    definition: Number of unemployed inhabitants (based on ILO classification)
    unit: million
- Population|Active:
    definition: Active population.
    unit: '%'

May be this last point on "Share' deserve a specific issue.

danielhuppmann commented 4 months ago

Thanks @FlorianLeblancDr for the response.

  1. Yes, we should definitely review whether these identifiers are relevant and we should remove them where possible.
  2. About the shares: In this issue, I'm more concerned about the naming convention rather than where/how it is computed. From the perspective of a user of the Scenario Explorer, they will find both "Unemployment" (in millions) and "Unemployment (Rate)" (in %) useful, so we should provide both.
orichters commented 4 months ago

I think "Volume" is mostly used for trade, and there it is needed to differentiate it against monetary values. And here it is not obvious whether Trade|Oil is in energy, mass, volume or monetary terms.

Same with

- Consumption|{Deciles}:
    definition: Share of total consumption accruing to the {Deciles}

From the name, it is not clear at all whether Consumption|D1 should be the monetary value or the share. So some harmonized structure is indeed helpful. A Parent|Whatever (Share) would then always be calculated by Parent|Whatever/Parent?

Regarding Index, it might be worth adding the reference year to the variable name, so Final Energy (Index 2020) or so.

I'm unsure whether calculating stuff in the database is good. On the one hand, it makes sure that all variables are consistently calculated (but that could be guaranteed by checks). But I also like to have the full set of variables in the file that I submit. That makes checks on our side easier and also avoids having to differentiate in the template between variables that should be supplied and others that shouldn't.

At PIK, we use at several places the structure Variable Name (Unit). Hope all our scripts survive Variable names with parantheses :)

danielhuppmann commented 4 months ago

At PIK, we use at several places the structure Variable Name (Unit). Hope all our scripts survive Variable names with parantheses :)

We could use square brackets so that things don't break on your side, so it would be Variable [Method] (Unit).

danielhuppmann commented 4 months ago

Regarding Index, it might be worth adding the reference year to the variable name, so Final Energy (Index 2020) or so.

This info is already contained in the unit.

danielhuppmann commented 4 months ago

I agree that having Volume or Value in trade/export/etc. variables make sense for intuition.

The "Consumption|{Decile}" example should be renamed as part of the common-definitions clean-up for consistency.

orichters commented 4 months ago

Regarding Index, it might be worth adding the reference year to the variable name, so Final Energy (Index 2020) or so.

This info is already contained in the unit.

That is right. I remember the problem was: In NGFS, we had some variables such as Price|Agriculture|Corn|Index with unit Index (2005 = 1) and some such as Price|Final Energy|Residential and Commercial|Commercial|Gases|Index with unit Index (2020 = 1). And then, we thought about harmonizing, but for backwards compatibility keeping the old one as well – and @phackstock immediately noted that was not possible because the variables would have the same name and that would mess up everything. That is why I thought adding the year to the variable might be an advantage.

volker-krey commented 4 months ago

I like the proposal to implement this structure for shares/volumes/values/indices.

danielhuppmann commented 4 months ago

Added a PR to clarify this approach in the IAMC-variable naming conventions of the ECE docs, see https://github.com/iiasa/ece-docs/pull/7