IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
222 stars 116 forks source link

Filter breaks when columns contain mixtures of str and int/float #423

Open Rlamboll opened 3 years ago

Rlamboll commented 3 years ago

Filter can get confused when being used on a column that contains a mixture of both strings and numbers. Oddly it doesn't have a problem if the value being asked for is an int, but does object to being asked for a string, as demonstrated below:

" import pyam import pandas as pd

_mc = "model_c" _sa = "scen_a" _sb = "scen_b" _eco2 = "Emissions|CO2" _gtc = "Gt C/yr" _ech4 = "Emissions|CH4" _mtch4 = "Mt CH4/yr" _msrvu = ["model", "scenario", "region", "variable", "unit"] simple_df = pd.DataFrame( [ [_mc, 1, "World", _eco2, _gtc, 0, 1000, 5000], [_mc, _sb, "World", _eco2, _gtc, 1, 1000, 5000], [_mc, _sa, "World", _ech4, _mtch4, 0, 300, 500], [_mc, _sb, "World", _ech4, _mtch4, 1, 300, 500], ], columns=_msrvu + [2010, 2030, 2050], ) simple_df = pyam.IamDataFrame(simple_df)

for scen in simple_df.scenarios(): print(type(scen)) simple_df.filter(scenario=scen).data "

This is a particular problem because the SR1.5 database has such a combination ("GENeSYS-MOD 1.0" uses a scenario name of 1.0, by default a float).

danielhuppmann commented 3 years ago

thanks for raising this issue @Rlamboll - I'm currently working on a backend-refactoring to pandas.Series, maybe some of the related changes will fix this as a side benefit...

Rlamboll commented 3 years ago

OK, I think the simple solution is to always ensure "model" and "scenario" are strings (I can't think of cases where that is problematic, even if ints/floats are fed in)