IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
226 stars 118 forks source link

Non-intuitive need to "refresh dataframe" after operating on .data #397

Closed Rlamboll closed 4 years ago

Rlamboll commented 4 years ago

As mentioned in #396, when operating directly on the .data object, derivative objects are not updated. This is resolved by "refreshing" the dataframe via DataFrame(df.data), like so:

import pyam import pandas as pd

Make arbitrary dataframe

_mc = "model_c" _sa = "scen_a" _sb = "scen_b" _eco2 = "Emissions|CO2" _gtc = "Gt C/yr" _ech4 = "Emissions|CH4" _mtch4 = "Mt CH4/yr" _msrvu = ["model", "scenario", "region", "variable", "unit"] simple_df = pd.DataFrame( [ [_mc, _sa, "World", _eco2, _gtc, 0, 1000, 5000], [_mc, _sb, "World", _eco2, _gtc, 1, 1000, 5000], [_mc, _sa, "World", _ech4, _mtch4, 0, 300, 500], [_mc, _sb, "World", _ech4, _mtch4, 1, 300, 500], ], columns=_msrvu + [2010, 2030, 2050], ) simple_df = pyam.IamDataFrame(simple_df)

Operate on data

simple_df.data["scenario", 0] = "not a"

Show that scenarios hasn't updated until the df is refreshed:

print(simple_df.scenarios()) remade_df = pyam.IamDataFrame(simple_df.data) print(remade_df.scenarios())

Printed lines 1 and 2 are not equal.

It's not clear that this is a bug to me, and it might be annoying/slowing to ensure that all calls to scenarios etc. check for consistency before outputting the values.

danielhuppmann commented 4 years ago

Sorry @Rlamboll, this is a user error rather than a bug - you aren't supposed to work on data in this way - you are making it inconsistent with the meta dataframe (from which scenarios() is derived).

I guess we should refactor this to _data (same with meta) to make that more obvious...

Rlamboll commented 4 years ago

It's not an error, as stated it works perfectly and is mostly an explanation of why I wanted the "refresh" function described in #396. Please don't refactor data, that will break everything in Silicone.