IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
227 stars 118 forks source link

Confirm expected behaviour of aggregate with nested hierarchy #737

Open willu47 opened 1 year ago

willu47 commented 1 year ago

Hi, following a question I asked in the openmod session today, please could you confirm the expected behaviour of the .aggregate function when presented with missing levels in the data hierarchy. For example, the following test fails because the two coal sub-categories Primary Energy|Fossil|Coal|Lignite and Primary Energy|Fossil|Coal|Brown are ignored.

Am I missing something?

import pandas as pd
from pyam import IamDataFrame, IAMC_IDX

LONG_IDX = IAMC_IDX + ["year"]

PRICE_NESTED_DF = pd.DataFrame(
    [
        ["model_a", "scen_a", "World", "Primary Energy|Fossil|Coal|Lignite", "EJ/yr", 2010, 10.0],
        ["model_a", "scen_a", "World", "Primary Energy|Fossil|Coal|Brown", "EJ/yr", 2010, 30.0],
        ["model_a", "scen_a", "World", "Primary Energy|Fossil|Gas", "EJ/yr", 2010, 45.0],
    ],
    columns=LONG_IDX + ["value"],
)

def test_nested_aggregate():
    actual = IamDataFrame(PRICE_NESTED_DF).aggregate(variable='Primary Energy|Fossil').data
    data = [
        ["model_a", "scen_a", "World", "Primary Energy|Fossil", "EJ/yr", 2010, 85.0]
    ]
    expected = pd.DataFrame(data, columns=LONG_IDX + ["value"])
    print(actual)
    print(expected)
    assert pd.testing.assert_frame_equal(actual, expected)
willu47 commented 1 year ago

If I add recursive=True argument to the aggregate() function I get this:

     model scenario region                    variable   unit  year  value
0  model_a   scen_a  World       Primary Energy|Fossil  EJ/yr  2010   85.0
1  model_a   scen_a  World  Primary Energy|Fossil|Coal  EJ/yr  2010   40.0
willu47 commented 1 year ago
(
    IamDataFrame(PRICE_NESTED_DF)
    .aggregate(variable='Primary Energy|Fossil', recursive=True)
    .aggregate(variable='Primary Energy|Fossil')
)

returns

     model scenario region               variable   unit  year  value
0  model_a   scen_a  World  Primary Energy|Fossil  EJ/yr  2010   40.0
danielhuppmann commented 1 year ago

Thanks @willu47 - indeed, I'd say that this is behaving as expected.

Question back to you: which other behavior would you find more intuitive? Or how could we improve the docs?

Sidenote:

df.aggregate("<variable>", append=True)

has the same behavior as

df.append(df.aggregate("<variable>"))

but the first option has better performance.

danielhuppmann commented 1 year ago

And FYI: pyam has a testing module with a function pyam.testing.assert_iamframe_equal, see the docs - this is maybe more appropriate for your use case because you don't have to worry about the order of the columns and rows (and it operates on an indexed pd.Series, so it's faster).