Add a method to fetch attributes for multiple columns

simonbowly commented 9 months ago

Proposed by @jacksimpsoncartesian. Say we have a dataframe including multiple series of variables:

>>> env = gp.Env()
>>> model = gp.Model()
>>> df = (
...     pd.DataFrame({
...         "obj": [4, 3, 2, 1],
...         "ub": [1, 2, 3, 4],
...     })
...     .gppd.add_vars(model, ub="ub", name="x")
...     .gppd.add_vars(model, obj="obj", name="y")
... )
>>> model.update()
>>> df
   obj  ub                  x                  y
0    4   1  <gurobi.Var x[0]>  <gurobi.Var y[0]>
1    3   2  <gurobi.Var x[1]>  <gurobi.Var y[1]>
2    2   3  <gurobi.Var x[2]>  <gurobi.Var y[2]>
3    1   4  <gurobi.Var x[3]>  <gurobi.Var y[3]>

we can already extract solution values series-wise, so the canonical way to append this new data as columns would be to use .assign():

>>> results = (
...     df.assign(x_X=lambda df: df['x'].gppd.X)
...     .assign(y_Y=lambda df: df['y'].gppd.X)
...     .assign(x_RC=lambda df: df['x'].gppd.RC)
... )
>>>
>>> results
   obj  ub                              x                              y  x_X  y_Y  x_RC
0    4   1  <gurobi.Var x[0] (value 0.0)>  <gurobi.Var y[0] (value 0.0)>  0.0  0.0   0.0
1    3   2  <gurobi.Var x[1] (value 0.0)>  <gurobi.Var y[1] (value 0.0)>  0.0  0.0   0.0
2    2   3  <gurobi.Var x[2] (value 0.0)>  <gurobi.Var y[2] (value 0.0)>  0.0  0.0   0.0
3    1   4  <gurobi.Var x[3] (value 0.0)>  <gurobi.Var y[3] (value 0.0)>  0.0  0.0   0.0

A useful feature could be a method or accessor to extract multiple attributes for multiple columns in a single call. For example:

df.gppd.X: extract the X attribute for all columns, and/or
df.gppd.get_attrs(["X", "RC"]) extract both attributes for all columns.

Some open questions for how to design this:

Should a missing attribute from one column cause a failure? Or should a mix of data and variables be gracefully handled? A pandonic option might be an errors=raise/ignore keyword.
Should new columns be appended, or should this method be a transform from a dataframe of variables to a dataframe of results? I think the latter makes more sense; if one method can extract all the necessary result data, then dropping the variables (which are no longer needed) seems like it would encourage a cleaner data-in data-out API for models.
How should new columns be named?

jacksimpsoncartesian commented 9 months ago

Thanks so much Simon, I haven't used the assign approach before so I'll test that out tomorrow.

I think you're right that returning the dataframe with the variable columns replaced with values would be the cleanest, however if you think there are users who may want to retain the variable columns, it isn't that big a deal if it created new columns with a name flagging that they're the values of the variables columns.

Would it be possible to do something similar when extracting the duals for constraint columns in in the dataframe? I'm often extracting a large number of different duals from these models and inserting them back into the dataframes.

Thanks again!

Dr-Irv commented 9 months ago

IMHO, the design of the application using gurobipy-pandas should separate data, decision variables, constraints, and solutions. What they share in common are indices.

So if you had a DataFrame with columns x, y and z, each holding decision variables, and they shared the same index, then you'd put the solution in a different DataFrame - same index, and it could have the same names. For different attributes, you could postpend the name of the attribute.

So I'm not supportive of this example:

>>> env = gp.Env()
>>> model = gp.Model()
>>> df = (
...     pd.DataFrame({
...         "obj": [4, 3, 2, 1],
...         "ub": [1, 2, 3, 4],
...     })
...     .gppd.add_vars(model, ub="ub", name="x")
...     .gppd.add_vars(model, obj="obj", name="y")
... )
>>> model.update()
>>> df
   obj  ub                  x                  y
0    4   1  <gurobi.Var x[0]>  <gurobi.Var y[0]>
1    3   2  <gurobi.Var x[1]>  <gurobi.Var y[1]>
2    2   3  <gurobi.Var x[2]>  <gurobi.Var y[2]>
3    1   4  <gurobi.Var x[3]>  <gurobi.Var y[3]>

I prefer this style:

>>> import gurobipy as grb
>>> import gurobipy_pandas as gppd
>>> import pandas as pd
>>> data_df = pd.DataFrame({"obj": [4,3,2,1], "ub": [1,2,3,4]})
>>> data_df
   obj  ub
0    4   1
1    3   2
2    2   3
3    1   4
>>> m=grb.Model()
### REMOVED LICENSE STUFF
>>> vars_df = pd.DataFrame({"x": gppd.api.add_vars(m, data_df, name="x", ub="ub"), 
...      "y": gppd.api.add_vars(m, data_df, name="y", obj="obj")})
>>> m.optimize()
Gurobi Optimizer version 10.0.0 build v10.0.0rc2 (win64)

CPU model: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz, instruction set [SSE2|AVX|AVX2]
Thread count: 8 physical cores, 16 logical processors, using up to 16 threads

### REMOVED LICENSE INFO THAT HAS MY EMAIL ADDRESS
Optimize a model with 0 rows, 8 columns and 0 nonzeros
Model fingerprint: 0x42a99770
Coefficient statistics:
  Matrix range     [0e+00, 0e+00]
  Objective range  [1e+00, 4e+00]
  Bounds range     [1e+00, 4e+00]
  RHS range        [0e+00, 0e+00]
Presolve removed 0 rows and 8 columns
Presolve time: 0.00s
Presolve: All rows and columns removed
Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    0.0000000e+00   0.000000e+00   0.000000e+00      0s

Solved in 0 iterations and 0.01 seconds (0.00 work units)
Optimal objective  0.000000000e+00
>>> sol_df = pd.DataFrame({f"{v}_{attr}" : vars_df[f"{v}"].gppd.get_attr(attr) 
...    for v in vars_df.columns for attr in ["x", "RC"]})
>>> sol_df
   x_x  x_RC  y_x  y_RC
0  0.0   0.0  0.0   4.0
1  0.0   0.0  0.0   3.0
2  0.0   0.0  0.0   2.0
3  0.0   0.0  0.0   1.0
>>>

So there is now data_df, vars_df and sol_df that has data, variables and solution all separated.

simonbowly commented 9 months ago

Would it be possible to do something similar when extracting the duals for constraint columns in in the dataframe? I'm often extracting a large number of different duals from these models and inserting them back into the dataframes.

Sure, this is also just an attribute, so it can be extracted the same way from a series of constraints. See for example the "slack" attribute in the Projects-Teams example.

simonbowly commented 9 months ago

Sure, I know your opinion on that @Dr-Irv, but the question here is really whether to take the existing functionality on Series:

>>> pd.DataFrame(dict(
...     x_x=vars_df["x"].gppd.X,
...     x_RC=vars_df["x"].gppd.RC,
...     y_x=vars_df["y"].gppd.X,
...     y_RC=vars_df["y"].gppd.RC,
... ))
   x_x  x_RC  y_x  y_RC
0  0.0   0.0  0.0   4.0
1  0.0   0.0  0.0   3.0
2  0.0   0.0  0.0   2.0
3  0.0   0.0  0.0   1.0

and wrap it up in an accessor on the DataFrame which handles (potentially) mixed column types and attributes.

simonbowly commented 9 months ago

I think the series accessors combined with existing pandas methods may be enough for now. Both are quite concise, and adding a new method which needs to handle various combinations of columns with different attributes could wind up with hard to explain behaviour. So, I will leave this one alone for now.

jacksimpsoncartesian commented 8 months ago

Thanks so much Simon, really appreciate you taking the time to review this - happy to use the existing methods to update my code - that should save me a lot of code which is what I was after!

Gurobi / gurobipy-pandas

Add a method to fetch attributes for multiple columns #60