PyPSA / linopy

Linear optimization with N-D labeled arrays in Python
https://linopy.readthedocs.io
MIT License
153 stars 40 forks source link

groupby sum not allowing multidimensional grouping per default #299

Closed aurelije closed 19 hours ago

aurelije commented 1 month ago

Hi,

I was surprised with behavior of sum on LinearExpressionGroupby object. I want to check if that is to expect or it is a bug. If that is correct behavior then I would like to have it explained as I couldn't figure out it from documentation.

So what is going about. I have a variable where one of dimensions is time based (per date) and I need to set constraints on weekly level. So I tried to do it by making weekly groups along time dimension:

# I know that argument could be shorted to string 'time.week' but it emits a bunch of warnings  
grouped_per_week = vars.groupby(vars.coords['time'].dt.isocalendar().week) 

The next thing I wanted to do is to perform sum over each group by all dimensions except for group dimension and one other dimension so I naturally performed this:


grouped_per_week.sum(['dim1', 'dim2', 'dim4'])

What I expected is to get LinearExpression with group and dim3 as dimensions and correct sums. But it doesn't work like that. What has worked is this:


grouped_per_week.sum().sum(['dim1', 'dim2', 'dim4'])

Notice one sum more that returns LinearExpression and then another sum that gives correct result.

FabianHofmann commented 1 month ago

hey @aurelije , thanks as always for reporting technical issues. Can you try


grouped_per_week.sum(['dim1', 'dim2', 'dim4'], use_fallback=True)
aurelije commented 1 month ago

@FabianHofmann that gives me: TypeError: LinearExpressionGroupby.sum() got multiple values for argument 'use_fallback' If I send list of dimensions as dim keyword parameter I am getting: TypeError: LinearExpressionGroupby.sum.<locals>.func() got an unexpected keyword argument 'dim' I got same with dims keyword param

FabianHofmann commented 1 month ago

Gotcha, I fear that the dims argument in the sum is not supported right now (even though it is just mapping to the xarray function when disabling fallback). If I see correctly, the only way to control the dims is to include them explicitly into the groupby argument, ie. use a dataarray with all dims in ['dim1', 'dim2', 'dim4'])

aurelije commented 1 month ago

@FabianHofmann so for now the only workaround is to use sum twice?

FabianHofmann commented 1 month ago

no, not necessarily. see for example:

from linopy import Model
import pandas as pd

m = Model()
z = m.add_variables(0, pd.DataFrame([[1, 2], [3, 4], [5, 6]]).T, name="z")

expr = 1 * z
groups = xr.DataArray([[1,1,2], [1,3,3]], coords=z.coords)
grouped = expr.groupby(groups).sum(use_fallback=True)
FabianHofmann commented 19 hours ago

closing this, please reopen if needed