Common region completeness check

phackstock commented 2 years ago

When writing tests for #99 I accidentally discovered a possible bug (or at the very least unexpected behavior) in the current implementation of region aggregation.

Issue outline

The bug occurs as follows:

Suppose we have a model mapping:

model: m_a
common_regions:
  - common_region_A:
    - region_A
    - region_B

If we now upload data that only contains region_A and run the aggregation, it will happily spit out an aggregated value for common_region_A. As region_B is missing from the input, however, the values for common_region_A will be identical to the ones for region_A.

Proposed fix

As we perform region aggregation we could check that for every common region we have data for all constituent regions. One open question would be what to do if we are missing a constituent region as in the example above, should we raise an error or write a warning to the log? My preference would be for error as the result described above is unexpected and could lead to wrong conclusions.

danielhuppmann commented 2 years ago

For the pyam aggregate()-method, the pyam docs (see here) make clear that missing values (components) is interpreted as 0. The aggregate_region() follows the same logic (but without an explicit note in the docs). So I would see that as (somewhat) expected behavior.

phackstock commented 2 years ago

Alright, I see, still maybe a bit dangerous I think. Should we leave it as is, add notes to the nomenclature docs, write logging info, ...?

danielhuppmann commented 2 years ago

Let's revisit the discussion after merging #99

phackstock commented 2 years ago

We could revisit this now. Maybe as part of a bigger documentation improvement effort.

danielhuppmann commented 2 years ago

I think https://github.com/IAMconsortium/pyam/pull/637 would be helpful for implementation... To avoid too much filtering. Hope I get to review this later today...

phackstock commented 2 years ago

Ah yes that would be nice.

IAMconsortium / nomenclature

Common region completeness check #105

Issue outline

Proposed fix