IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
222 stars 116 forks source link

Concatenating and/or appending between subannual and yearly data #626

Open phackstock opened 2 years ago

phackstock commented 2 years ago

Problem description

When trying to combing two or more IamDataFrames using pyam.concat or IamDataFrame.append() we get an error if we combine frames with and without a subannual column. Here's a minimum example to reproduce the error:

from pyam import IamDataFrame, IAMC_IDX, concat
import pandas as pd

iam_frame = IamDataFrame(pd.DataFrame(
    [["model_a", "scen_a", "World", "Primary Energy", "EJ/yr", 1, 6.0]],
    columns=IAMC_IDX + [2005, 2010],
))
iam_frame_subannual = IamDataFrame(pd.DataFrame(
    [["model_a", "scen_a", "World", "Primary Energy", "EJ/yr", "summer", 1, 6.0]],
    columns=IAMC_IDX + ["subannual", 2005, 2010],
))
# Three options to get ValueError: Incompatible timeseries data index dimensions
concat([iam_frame, iam_frame_subannual])
iam_frame.append(iam_frame_subannual)
iam_frame_subannual.append(iam_frame)

Proposed solution

Before append or concat perform their respective tasks all IamDataFrames involved are checked if they have subannual columns. There are two outcomes of this check:

  1. All or none of the data frames have a "subannual" column. In this case there's no further action reqiured.
  2. Some data frames have a "subannual" column while other do not. In this case we add a new "subannual" column with the value "year" for the missing ones and then go ahead with concatenating or appending.
danielhuppmann commented 2 years ago

Note that I implemented a similar solution in #598 for appending/merging IamDataFrame instances with both yearly data (as integer) and continuous-time resolution (as datetime). I also did a bit of refactoring and restructuring the test suite to have concat and append behave in a similar manner.

phackstock commented 2 years ago

Ah very good. I'll have a look to take some inspiration from that.

willu47 commented 2 years ago

@EmiFej and I also came across the error. The error text is misleading - "incompatible timeseries dimensions" - the error is thrown whenever there are extra columns in dataframes being appended.