IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
226 stars 118 forks source link

Add an `Iamslice` #657

Closed danielhuppmann closed 2 years ago

danielhuppmann commented 2 years ago

Please confirm that this PR has done the following:

Description of PR

This PR implements an IamSlice based on the initial work by @coroa in #637. I rebased the branch and added basic documentation.

One question: in the initial work, __repr__ combined a summary similar to an IamDataFrame and a representation of the pd.Series. In the usual test case filtered for `scenario="scen_b", this yields.

<class 'pyam.slice.IamSlice'>
Index dimensions and data coordinates:
   model    : model_a (1)
   scenario : scen_b (1)
   region   : World (1)
   variable : Primary Energy (1)
   unit     : EJ/yr (1)
   year     : 2005, 2010 (2)

model    scenario  region  variable             unit   year
model_a  scen_a    World   Primary Energy       EJ/yr  2005    False
                                                       2010    False
                           Primary Energy|Coal  EJ/yr  2005    False
                                                       2010    False
         scen_b    World   Primary Energy       EJ/yr  2005     True
                                                       2010     True
dtype: bool

I find that a bit confusing, because the list of variables shows only "Primary Energy" but the series also shows "Primary Energy|Coal" (as False, but still). I therefore removed the second part, so that __repr__ only shows the summary overview.

If you think that having the slice as pd.Series, we could add a method as_series().

codecov[bot] commented 2 years ago

Codecov Report

Merging #657 (9063787) into main (764a85e) will decrease coverage by 0.0%. The diff coverage is 95.0%.

@@           Coverage Diff           @@
##            main    #657     +/-   ##
=======================================
- Coverage   94.5%   94.5%   -0.1%     
=======================================
  Files         57      59      +2     
  Lines       5662    5742     +80     
=======================================
+ Hits        5356    5431     +75     
- Misses       306     311      +5     
Impacted Files Coverage Δ
pyam/slice.py 91.4% <91.4%> (ø)
pyam/core.py 94.5% <91.6%> (-0.1%) :arrow_down:
pyam/__init__.py 68.4% <100.0%> (+0.8%) :arrow_up:
tests/conftest.py 100.0% <100.0%> (ø)
tests/test_filter.py 100.0% <100.0%> (ø)
tests/test_slice.py 100.0% <100.0%> (ø)
tests/test_time.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 764a85e...9063787. Read the comment docs.

coroa commented 2 years ago

One question: in the initial work, repr combined a summary similar to an IamDataFrame and a representation of the pd.Series. In the usual test case filtered for `scenario="scen_b", this yields.

Well it is a pd.Series, ie. isinstance(..., pd.Series) will be True, so there is no point to a separate method (pd.Series(a_slice) will work fine).

My main argument for showing the Series representation is that it being a Series opens up several interesting computations, that people might want to know about, which don't make so much sense, if you don't know it is a boolean series.

The most interesting one is that you can do boolean operations, ie if you want from your example

s_scen_a = test_df.slice(scenario="scen_a")
s_coal = test_df.slice(variable="*|Coal")

s_scen_a & ~ s_coal

will give you

<class 'pyam.slice.IamSlice'>
Index dimensions and data coordinates:
   model    : model_a (1)
   scenario : scen_a (1)
   region   : World (1)
   variable : Primary Energy (1)
   unit     : EJ/yr (1)
   year     : 2005, 2010 (2)

model    scenario  region  variable             unit   year
model_a  scen_a    World   Primary Energy       EJ/yr  2005    True
                                                       2010    True
                           Primary Energy|Coal  EJ/yr  2005    False
                                                       2010    False
         scen_b    World   Primary Energy       EJ/yr  2005    False
                                                       2010    False
dtype: bool
danielhuppmann commented 2 years ago

I agree with the implementation and behavior. I'm just worried that implementing __repr__ as combining the IamDataFrame-style overview plus the Series-print can be confusing.

Options:

  1. Leave as is, ignore my concern because this will be an expert-only feature anyway
  2. Only show the Series-representation
  3. Only show the IamDataFrame-style-overview
coroa commented 2 years ago

I guess the overview is what is going to be the most helpful. So, i'd vote rather for showing the overview, ie option 2 (although, i like seeing both). ie. let's keep it as it is here now. Then we can merge!

What this PR is missing and what is leftover as a new PR would be to refactor the second half of IamDataFrame.filter into a new method (could be called subset for instance) or generic helper and use that from __getitem__ and filter; so that idf[idf.slice(**kwargs)] is equivalent to idf.filter(**kwargs).

(Currently __getitem__ is dropping meta and does not do the year<->time swapping correctly)