IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
221 stars 115 forks source link

Remove automated sorting of data #812

Open danielhuppmann opened 4 months ago

danielhuppmann commented 4 months ago

The pyam package currently automatically sorts the _data series and meta dataframe by their index. This makes it easy for consistency, assert-frame-equal and some operations like interpolation. But it can have unintended consequences in cases where ordering is forgotten, e.g. #811

Also, the repeated ordering is probably not very resource-efficient for large IamDataFrame instances.

For pyam 3.0, I suggest to drop the automated ordering on initialization and rename/aggregation/etc. methods, and instead provide a sort() method that can be called explicitly. We could also have a kwarg on all relevant methods whether to sort, but that may not effective on the effort-vs.-benefit trade-off.

@phackstock @gidden @znicholls, any thoughts?

phackstock commented 4 months ago

I like the idea of making sorting optional. I cannot really think of a use case off the top of my head where I care or depend on the order of data. For assert-frame-equal we would then also introduce a keyword argument that would switch whether or not order is considered when checking for equality.

danielhuppmann commented 4 months ago

Reminder: not sorting the time column may cause confusion when working with the wide timeseries format (e.g., write to xlsx)