`interpolate(inplace=True)` doesn't work running after `filter`

gorkemgungormetu commented 6 months ago

Referring to https://github.com/IAMconsortium/pyam/issues/240, I managed to walk around a similar problem with interpolate as below.

df.filter(model="MENR*", variable='Primary Energy*').timeseries()

model	scenario	region	variable	unit	2020	2030	2040	2050
MENR (2022)	CO2	Turkey	Primary Energy-Coal	EJ/yr	1.699841	2.005477		0.376812
MENR (2022)	CO2	Turkey	Primary Energy-Gas	EJ/yr	1.666346	1.997104		1.226732
MENR (2022)	CO2	Turkey	Primary Energy-Nuclear	EJ/yr	NaN	0.334944		3.068924
MENR (2022)	CO2	Turkey	Primary Energy-Oil	EJ/yr	1.766830	2.294366		0.586152
MENR (2022)	CO2	Turkey	Primary Energy-Renewables	EJ/yr	1.029953	1.699841	5.233500

Interpolating all the dataframe didn't work as remaining dataset had missing data for the year 2050. So I needed to interpolate this part and merge into the original dataframe which I managed in four command lines.

df_inter = df.filter(model="MENR*", variable='Primary Energy*')
df_inter.interpolate(2040, inplace=True)
df = df.filter(model="MENR*", variable='Primary Energy*', keep=False)
df = pyam.concat([df, df_inter])

danielhuppmann commented 6 months ago

Thank you for starting this issue, but can you clarify why the "direct approach" did not work as expected?

df.interpolate(2040, inplace=True)

gorkemgungormetu commented 6 months ago

Because my dataframe includes additional rows with missing values in the year 2050. I expected the command would apply the method in the subgroup by using df.filter(model="MENR*", variable='Primary Energy*').interpolate(2040, inplace=True), similar to df.convert_unit('Mtoe/yr', to='EJ/yr', inplace=True), but it didn't put the interpolated data in the dataframe df.

danielhuppmann commented 6 months ago

Ok, I see - there are two issues.

First, you are doing a chained operation. If you spell this out explicitly, it should be clear why inplace does not have the expected effect.

x = df.filter(model="MENR*", variable='Primary Energy*')
x.interpolate(2040, inplace=True)

So the inplace works on x, not df.

Second, yes, interpolate() raises an error if a timeseries does not have values before and after time. This is in line with the principle of "fail loud", but there may be better solutions or options.

I guess that your solution is indeed the best short-term strategy. I'll start a new, targeted issue.

IAMconsortium / pyam

`interpolate(inplace=True)` doesn't work running after `filter` #806