IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
221 stars 115 forks source link

`interpolate(inplace=True)` doesn't work running after `filter` #806

Closed gorkemgungormetu closed 6 months ago

gorkemgungormetu commented 6 months ago

Referring to https://github.com/IAMconsortium/pyam/issues/240, I managed to walk around a similar problem with interpolate as below.

df.filter(model="MENR*", variable='Primary Energy*').timeseries()
model scenario region variable unit 2020 2030 2040 2050
MENR (2022) CO2 Turkey Primary Energy-Coal EJ/yr 1.699841 2.005477 0.376812
MENR (2022) CO2 Turkey Primary Energy-Gas EJ/yr 1.666346 1.997104 1.226732
MENR (2022) CO2 Turkey Primary Energy-Nuclear EJ/yr NaN 0.334944 3.068924
MENR (2022) CO2 Turkey Primary Energy-Oil EJ/yr 1.766830 2.294366 0.586152
MENR (2022) CO2 Turkey Primary Energy-Renewables EJ/yr 1.029953 1.699841 5.233500

Interpolating all the dataframe didn't work as remaining dataset had missing data for the year 2050. So I needed to interpolate this part and merge into the original dataframe which I managed in four command lines.

df_inter = df.filter(model="MENR*", variable='Primary Energy*')
df_inter.interpolate(2040, inplace=True)
df = df.filter(model="MENR*", variable='Primary Energy*', keep=False)
df = pyam.concat([df, df_inter])
danielhuppmann commented 6 months ago

Thank you for starting this issue, but can you clarify why the "direct approach" did not work as expected?

df.interpolate(2040, inplace=True)
gorkemgungormetu commented 6 months ago

Because my dataframe includes additional rows with missing values in the year 2050. I expected the command would apply the method in the subgroup by using df.filter(model="MENR*", variable='Primary Energy*').interpolate(2040, inplace=True), similar to df.convert_unit('Mtoe/yr', to='EJ/yr', inplace=True), but it didn't put the interpolated data in the dataframe df.

danielhuppmann commented 6 months ago

Ok, I see - there are two issues.

First, you are doing a chained operation. If you spell this out explicitly, it should be clear why inplace does not have the expected effect.

x = df.filter(model="MENR*", variable='Primary Energy*')
x.interpolate(2040, inplace=True)

So the inplace works on x, not df.

Second, yes, interpolate() raises an error if a timeseries does not have values before and after time. This is in line with the principle of "fail loud", but there may be better solutions or options.

I guess that your solution is indeed the best short-term strategy. I'll start a new, targeted issue.