grb accessor should work on Series and Index as well

Gurobi / gurobipy-pandas

Convenience wrapper for building optimization models from pandas data

https://gurobipy-pandas.readthedocs.io/en/stable/

Apache License 2.0

83 stars 15 forks source link

grb accessor should work on Series and Index as well #39

Closed Dr-Irv closed 1 year ago

Dr-Irv commented 1 year ago

I'd want to be able to do Series.grb and Index.grb, as in the following:

>>> s=pd.Series(range(5), index=range(5))
>>> s
0    0
1    1
2    2
3    3
4    4
dtype: int64
>>> y=s.grb.pd_add_vars(m, "y")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda3\envs\gurobi\lib\site-packages\gurobipy_pandas\accessors.py", line 259, in __getattr__
    data=[v.getAttr(attr) for v in self._obj],
  File "C:\Anaconda3\envs\gurobi\lib\site-packages\gurobipy_pandas\accessors.py", line 259, in <listcomp>
    data=[v.getAttr(attr) for v in self._obj],
AttributeError: 'int' object has no attribute 'getAttr'
>>> ind=s.index
>>> z=ind.grb.pd_add_vars(m, "z")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda3\envs\gurobi\lib\site-packages\gurobipy_pandas\accessors.py", line 432, in pd_add_vars
    x = model.addVars(indices, lb=lb, ub=ub, vtype=vtype, name=name)
  File "src\gurobipy\model.pxi", line 2889, in gurobipy.Model.addVars
TypeError: must be real number, not str

simonbowly commented 1 year ago

The index accessor should already work for this. Note that as with gurobipy, the name is not the first (or in this case, second, after model) positional argument, so to use it this way name must be a kwarg:

>>> s = pd.Series(range(5), index=range(5))
>>> myvars = s.index.grb.pd_add_vars(m, name="x")
>>> m.update()
>>> myvars
0    <gurobi.Var x[0]>
1    <gurobi.Var x[1]>
2    <gurobi.Var x[2]>
3    <gurobi.Var x[3]>
4    <gurobi.Var x[4]>
Name: x, dtype: object

For the series accessor, would you expect s.grb.pd_add_vars(m, name="x") to return the same series as s.index.grb.pd_add_vars(m, name="x")? Should it return a dataframe which joins the variables to the original series data? Would the series data be used in any way?

I have not implemented this in the series accessor yet since there I could not see any consistent way to use the series data (unlike the dataframe accessor, where you may have multiple named columns which specify objective, bounds, etc), so it seemed to make more sense for the user to explicitly point to the index and build a new series from that.

Dr-Irv commented 1 year ago

For the series accessor, would you expect s.grb.pd_add_vars(m, name="x") to return the same series as s.index.grb.pd_add_vars(m, name="x")?

Yes.

Should it return a dataframe which joins the variables to the original series data?

I'd vote just a Series. You could make this an additional keyword argument.

Would the series data be used in any way?

I don't think so, although maybe you'd like to say "use the series for LB" or "use the series for UB" or "use the series for the objective" via some keyword argument.

Dr-Irv commented 1 year ago

I did my first model, and there are inconsistencies in the API that are a bit annoying.

If I have data in a DataFrame and create variables, it returns a DataFrame with the appended column of the variables. I think it should return the Series, and then it is my choice as to whether to append the variables to the DataFrame that was passed in, or keep it in a separate Series. Here's an example:
```
Make = rmDF.grb.pd_add_vars(model, "Make")["Make"]
```
In the above, I had to extract the series. I think it is more natural to have pd_add_vars just return a Series. If I want to append it (assuming a return result of a Series), I could probably do
```
rmDF = pd.concat([rmDF, rmDF.grb.pd_add_vars(model, "Make")], axis=1)
```

When I made variables from an Index, I wrote this and it worked:

Inv = invindex.grb.pd_add_vars(model, name="Inv")

I find the inconsistency in the API with respect to when to use a keyword argument for the name to be annoying. In the first example, where I created a series of variables from a DataFrame, the name argument was positional. In the second example, from an Index, I had to use a keyword argument to specify the name. It should be one way or the other (either always positional or always keyword).

rrandall1471 commented 1 year ago

I'm going to slightly agree with @Dr-Irv on this one. In some ways I like that it returns the full pd.DataFrame, but it was definitely not what I expected, as I expected the exact same behavior as Irv. However, it allows me to chain variable creation if they are all built using the same index, which often happens when building models:

df = df.grb.pd_add_vars(m, "x").grb.pd.add_vars(m, "y")

and you get a pd.DataFrame back with both variables added to it. I do think Irv's thought of returning a pd.Series is probably more pandonic and would allow you to do this instead of what I said above:

df = df.assign(x=lambda df: df.grb.pd_add_vars(m, "x"),
               y=lambda df: df.grb.pd_add_vars(m, "y"))

Dr-Irv commented 1 year ago

You may want to consider making pd_add_vars() and pd_add_constrs() top level functions, as opposed to hanging off the accessors. So instead of:

Make = rmDF.grb.pd_add_vars(model, "Make")["Make"]

you could have:

import gurobipy_pandas as grbp
Make = grbp.add_vars(model, rmDF, "Make")

That would allow you to easily add type hints for pd_add_vars() and pd_add_constrs()

You could even consider extending the gurobipy API (although that might not be possible as open source):

Make = model.pd_add_vars(rmDF, "Make")

One advantage of this is that the pd_add_vars() and pd_add_constrs() could take a DataFrame, Series, or Index as an argument, and always return a Series.

You could then just use the .grb accessor to access attributes, and the accessor would only apply to Series. You wouldn't be able to type-hint those (unless you only provide them on Series of gurobi objects with a subclass of Series).

simonbowly commented 1 year ago

Thanks both!

Implemented Series.grb.pd_add_vars with 0be5900 (just calls down to the index)
Keyword-only arguments make sense to me to avoid confusion, opened #45 for that
Trying to capture some of this discussion around top-level functions, series vs. dataframes, etc in #46, as it seems two competing API styles are emerging