Setting records of multidimensional Set with empty DataFrame fails

grecht commented 4 months ago

Hi, I have a situation where depending on the input data a multidimensional set might be empty. This would be encoded in the empty records DataFrame, and I want to avoid having to check if it is empty. This seems to work for one dimensional sets, but not for multidimensional sets:

import pandas as pd
import gamspy as gp

ct = gp.Container()

df_a = pd.DataFrame(columns=["a"])
A = ct.addSet("A", records=df_a)  # works

# does not work
B = ct.addSet("B")

df_ab = pd.DataFrame(columns=["a", "b"])
AB = ct.addSet("AB", domain=[A, B], records=df_ab)  # does not work

grecht commented 4 months ago

I found some related problems with empty one dimensional sets. I tried a couple of things (in the example, each working/not working group is separated by an empty line):

import pandas as pd
import gamspy as gp

ct = gp.Container()

# works
A = ct.addSet("A", records=pd.Index([]))

# does not work
A = ct.addSet("A", records=set())

A = ct.addSet("A", records=[])

A = ct.addSet("A", records=pd.Index([]))
B = ct.addSet("B", domain=A, records=pd.Index([]))

A = ct.addSet("A", records=pd.Index([]))
B = ct.addSet("B", domain=A, records=[])

It's unexpected that it works with an empty pandas index, but not with an empty python set. If the gamspy set is a subset of another set, it works with neither of the two options.

mabdullahsoyturk commented 4 months ago

{} is a dictionary type in Python. If you want to provide an empty set, you should use set() instead. This already solves the problem for some the examples you gave. We will work on the other examples.

grecht commented 4 months ago

Oh, that's right. I guess it was a little late in the day to think up examples. I corrected them, however the examples also fail with [] or set().

boxblox commented 3 months ago

Hi @grecht Thanks for interacting via github. I am incorporating a fix for your first example with the empty dataframes. Thanks for pointing this out. I suspect that it should be available in a couple weeks when our next GAMS minor version is released.

Your other examples need to be taken a bit slower... but before getting into those, it's helpful to know that our apis will expect a standard format when setting records. One of the checks we make is on the shape of the data (the number of columns in a corresponding dataframe must be == to the symbol dimension). There are several examples that will not pass this test, thus the failure.

Internally we use the pd.DataFrame() constructor to create the dataframe from non-native data formats, then we probe the df.shape to get an idea of the data "dimensionality". Thus we are somewhat bound to the behavior of pandas in interpreting the different empty collections you propose.

For example...

A = ct.addSet("A", records=set())

... does not work because pd.DataFrame(set()) returns an empty dataframe with zero columns, but we always expect 1 column with a domain set (domain sets in GAMS are implicitly defined over * aka the universe set).

Similar issues exist when passing [].

The example...

A = ct.addSet("A", records=pd.Index([]))

... works because pd.DataFrame(pd.Index([])) will return a dataframe with 1 column (named 0), and this matches the dimensionality defined for A.

The example...

m = gp.Container()
A = m.addSet("A", records=pd.Index([]))
B = m.addSet("B", domain=A, records=[])

... is OK for A but the dimensionality is not consistent for B for the same reasons described above.

Hope this helps for now!

GAMS-dev / gamspy

Setting records of multidimensional Set with empty DataFrame fails #8