danielhuppmann commented 2 years ago

Description

We often want to know which elements exist on an dimension after filtering, e.g., which variables exist for a specific region (see https://github.com/IAMconsortium/nomenclature/pull/99). This can be done by the following

vars = df.filter(region="Region A").variable

However, this is inefficient because this creates a full (downselected) copy of the (timeseries) data and meta tables.

Proposed Solution

A new class IamSlice which is a derivative of the pd.MultiIndex of the internal _data pd.Series. The IamSlice is returned by the method slice(), which takes the same arguments as filter().

Expected usage

vars = df.slice(region="Region A").variable

gidden commented 2 years ago

+1, great idea!

coroa commented 2 years ago

I like the idea in general, but am wondering whether an underlying boolean mask, with maybe a method to extract the indices would not be more composable. and i would also argue for either accepting a slice as first positional argument to df.filter or again to df.__getitem__, so that:

df[df.slice(region="...")] == df.filter(region="...")

danielhuppmann commented 2 years ago

I like your idea about a boolean mask, but I would not how to implement it...

On the second idea about allowing df[df.slice()], that sounds great and easily doable...

danielhuppmann commented 2 years ago

Closed via #637

IAMconsortium / pyam

Adding a `IamSlice` feature #630

Description

Proposed Solution

Expected usage