After some more dabbling with the current codebase, I suggest using an IamDataFrame as the return-type directly of the filter_*() functions, something along following structure:
def filter_*(*args, **kwargs):
data = ... # processing from osemosys to dataframe
return IamDataFrame(data, **kwargs)
In my opinion, this would have several benefits:
Better performance, because pyam internally uses an indexed pd.Series instead of a pd.DataFrame
Earlier catching of errors (e.g., duplicates as in #17)
pyam was already refactored to use pd.concat internally instead of pd.DataFrame.append (even pyam.IamDataFrame.append uses pd.concat()), so this would solve #14
You could use specific aggregation-methods by pyam directly (e..g, see aggregate()), so
aggregated = aggregate(data)
if not aggregated.empty:
iamc = make_iamc(aggregated, config['model'], config['scenario'], input['iamc_variable'], unit)
blob.append(iamc)
After some more dabbling with the current codebase, I suggest using an IamDataFrame as the return-type directly of the
filter_*()
functions, something along following structure:In my opinion, this would have several benefits:
pd.concat
internally instead ofpd.DataFrame.append
(evenpyam.IamDataFrame.append
usespd.concat()
), so this would solve #14You could use specific aggregation-methods by pyam directly (e..g, see
aggregate()
), socould be simplified to
and you get validation (guarding against duplicates, ...) and better performance (because indexed series) free of charge.
Happy to discuss here or in person at the ECEMF meeting!