IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
223 stars 117 forks source link

df.subtract along year axis #656

Open byersiiasa opened 2 years ago

byersiiasa commented 2 years ago

I tested the .subtract() function passing axis='year' but I guess that was my misunderstanding and it is not supposed to work as the result needs to be assigned to a 'year'. e.g.

variable = 'Emissions|CO2'
df.filter(variable=variable).subtract(a='2100', b='2020', name='Emissions|CO2 2100-2020', axis='year')

@gidden suggested to add at least an error warning advising that the operation along the 'year' axis is not supported.

Error message:
ValueError                                Traceback (most recent call last)
pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "net change in Land Cover"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_176/2230843147.py in <module>
      5 name = f'net change in Land Cover'# Affor-Refor ({lastyear}-{baseyear} {unit})'
      6 vari = 'Land Cover|Forest|Afforestation and Reforestation'
----> 7 dfar6.filter(variable=vari).subtract(a=2100, b=2020, name=name, axis='year', ignore_units=True)

c:\github\pyam\pyam\core.py in subtract(self, a, b, name, axis, fillna, ignore_units, append)
   1928             self.append(_value, inplace=True)
   1929         else:
-> 1930             return IamDataFrame(_value, meta=self.meta)
   1931 
   1932     def multiply(

c:\github\pyam\pyam\core.py in __init__(self, data, meta, index, **kwargs)
    143                 setattr(self, attr, value)
    144         else:
--> 145             self._init(data, meta, index=index, **kwargs)
    146 
    147     def _init(self, data, meta=None, index=DEFAULT_META_INDEX, **kwargs):

c:\github\pyam\pyam\core.py in _init(self, data, meta, index, **kwargs)
    159         # cast data from pandas
    160         if isinstance(data, pd.DataFrame) or isinstance(data, pd.Series):
--> 161             _data = format_data(data.copy(), index=index, **kwargs)
    162         # read data from ixmp Platform instance
    163         elif has_ix and isinstance(data, ixmp.TimeSeries):

c:\github\pyam\pyam\utils.py in format_data(df, index, **kwargs)
    336 
    337     # format the time-column
--> 338     df = format_time_col(df, time_col)
    339 
    340     # cast to pd.Series, check for duplicates

c:\github\pyam\pyam\utils.py in format_time_col(data, time_col)
    357     """Format time_col to int (year) or datetime"""
    358     if time_col == "year":
--> 359         data["year"] = to_int(pd.to_numeric(data["year"]))
    360     elif time_col == "time":
    361         data["time"] = pd.to_datetime(data["time"])

~\Anaconda3\envs\py38\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
    152         coerce_numeric = errors not in ("ignore", "raise")
    153         try:
--> 154             values = lib.maybe_convert_numeric(
    155                 values, set(), coerce_numeric=coerce_numeric
    156             )

pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "net change in Land Cover" at position 0
danielhuppmann commented 2 years ago

Thanks for reporting this issue, @byersiiasa!

Played around for a few minutes, I think there are three issues:

A suggested alternative using the subtract method and then adding the result to meta:


df.set_meta(
    meta=df.filter(variable="Emissions|CO2").subtract(a=2100, b=2020, name="0", axis="year", append=False)._data,
    name="Emissions|CO2 2100-2020",
)
gidden commented 2 years ago

Indeed - at the moment the numerical operations assume explicitly that they will not be applied to the time axis (e.g., year). The result of such a calculation would have a time dimensionality of 0, and thus be considered metadata in our current data model.

I think it's an open question as to whether we would want to support this directly in the operation interface (e.g., if axis is year, then add to meta) or have a separate interface that supports that.

The implementation you provide @danielhuppmann I think is great for that, now it's just a question of what we want to support in a 'first class' manner.

danielhuppmann commented 2 years ago

For clarification, the numerical operations work fine on the "year" dimension, i.e.

df.subtract(a=1, b=2, name="3", axis="year")

except for the "bug" that the returned index-value on the time domain has to be specified as an integer-castable string instead of an integer.

However, in many cases, this will not really make sense as a "timeseries"-type data. (Usually, diff() might be more useful here.)

So imho, the actual question is: should we explicitly raise an error when doing data operations along the time domain?

I would not overload the data-ops methods with some optionality to add to meta instead of data - instead, the method set_meta_from_data would be a better prototype...