Add a top level function pd_add_constr(model, lhs_series, sense, rhs_series)

Dr-Irv commented 1 year ago

I wrote a model that had constraints that look like this:

pairday_df = pd.concat([h, allvars.daily_hours, pairday], axis=1)
pairday_df.grb.pd_add_constrs(m, "h <= 2* daily_hours * pairday", name="PairDayUp")
pairday_df.grb.pd_add_constrs(m, "2 * pairday <= h", name="PairDayLow")

But if would have been really cool if I could have written the constraints like this:

pd_add_constrs(m, h, "<=", 2 * allvars["daily_hours"] * pairday, name="PairDayUp")
pd_add_constrs(m, 2 * pairday, "<=", h, name="PairDayLow")

In this example, the LHS and RHS are just Series, so you really don't need the pairday_df at all (especially if you switch to the idea of making pd_add_constrs() a function rather than use it through the accessor). You could check if the two Series were aligned and raise an Exception if not.

rrandall1471 commented 1 year ago

I agree with @Dr-Irv here, that would be a very nice function to have for more simple constraints where you don't need to create a new pd.DataFrame just to use two columns to make the expression out of when you could just do it directly in the function from the pd.Series.

simonbowly commented 1 year ago

@Dr-Irv would you mind sharing a more complete snippet of the above example including where/how the variables are created? For instance I would have written this as follows (assuming daily_hours is data and h, pairday are variables):

daily_hours = pd.DataFrame(columns=["daily_hours"], data=...)
pairday_df = (
    daily_hours
    .grb.pd_add_vars(m, name="h")
    .grb.pd_add_vars(m, name="pairday")
)

At that point there's no concat needed, everything was already munged into the same dataframe and aligned on the same index so constraint building just works:

pairday_df.grb.pd_add_constrs(m, "h <= 2* daily_hours * pairday", name="PairDayUp")
pairday_df.grb.pd_add_constrs(m, "2 * pairday <= h", name="PairDayLow")

So I'm curious why the variables are separated out as series in the first place?

simonbowly commented 1 year ago

Discussed a bit further with @rluce. For the example above (hopefully I am not misrepresenting it) we think it makes more sense to create variables attached to the original dataframe and use the accessors, and we'll focus on that approach in the documentation and examples.

However for some cases a top-level pd_add_constrs function makes sense, such as aggregations across a multi-index that need to be aligned with an existing series:

pd_add_constrs(model, df.groupby("Job")["Assign"], GRB.EQUAL, requirements)

or as a convenience function to avoid direct use of the gurobipy model and the associated API confusion that comes with it:

pd_add_constrs(model, series_var.sum(), GRB.EQUAL, 1.0)

Dr-Irv commented 1 year ago

Discussed a bit further with @rluce. For the example above (hopefully I am not misrepresenting it) we think it makes more sense to create variables attached to the original dataframe and use the accessors, and we'll focus on that approach in the documentation and examples.

I really disagree here. If the DataFrame contains data, you want the variables in a separate object. I don't want to mix my data with my decision variables. That's why I put them in a Series.

Dr-Irv commented 1 year ago

So I'm curious why the variables are separated out as series in the first place?

From an object oriented design standpoint, we avoid mixing DataFrames that have data and DataFrames (or Series) that are decision variables.

This is akin to how modeling languages work. You declare your data. You declare your decision variables. The decision variables are indexed on sets from the data, but you don't create objects that mix the two.

One technique we use is to leverage python dataclass patterns. We could have

import dataclasses as dc

@dc.dataclass
class MyData:
    table1:  pd.DataFrame
    table2:  pd.DataFrame

@dc.dataclass
class MyDecisionVariables:
    var1:  pd.Series
    var2:  pd.Series

I am NOT a fan of the mixing of the data objects and the decision variable objects, which is what you have chosen to do.

I really believe that a cleaner design would have pd_add_constr() and pd_add_vars() at the top level, which would give you the benefit of type checking for those two functions, and then the accessors are only used to get access to the gurobi attributes for the corresponding gurobi objects stored in a Series (i.e., don't use accessors for Index or DataFrame)

rrandall1471 commented 1 year ago

I agree with @Dr-Irv. There are several reasons to do this, but as Irv said at a minimum it is good programming practice for separation of data and model objects.

Even when the var has the same index as a DF, its not necessarily clean to just add a new column to the DF as that may cause confusion down the road in you have to remember which DF has each var in it, instead of just the Series that matches the var's name.

In the model I rewrote to use your new pandas API, it has around 20 unique variable types, with 7-10 different index sets. Some of these don't even match an index in the input data, also there are many different data inputs that share an index but come from different sources so then which DF do you use to build the var set? And then I'd have to remember which of those DFs has the var column in it.

Also, it's about maintaining code, if you set it up the way Irv said above it makes it much easier for a new person to come in and see what's going on. If I can't remember which data DF I assigned that variable to, there would no chance for a new person to come in and understand it, whereas if we make a Data specific class and a Var specific class that separation makes it easier for new teammates to come in and understand where everything is at.

simonbowly commented 1 year ago

Ok, so this seems to have changed from what I initially understood as a "nice to have for simple cases" global function to the dominant way you would prefer to write models?

@rrandall1471 you mentioned:

In the model I rewrote to use your new pandas API, it has around 20 unique variable types, with 7-10 different index sets ...

Could you please share the code you wrote for this model (even if it's rough / without sample data)? I'd like to get a clearer idea of how you're naming your different data inputs and variable series to understand where the pain points come from.

This complete data/decision variable separation is quite a departure from the initial style, especially the pd.eval style which requires variables and data to live in the one dataframe. The change would mean an almost complete rewrite of the code, tests, and examples, as we'd like to present a consistent approach. If that's the way we need to go, that's fine, but I'd like to thoroughly convince myself before committing to it. I'm rewriting my examples in that form first of all, but if you could share some complete examples you've formulated (maybe by email is easier?) then I can try rewriting them in the accessor style to see where it breaks down for me.

Thanks!

Dr-Irv commented 1 year ago

I will email you two examples that I did, using the accessor style. One is simple, the other more complex. Not sure if @rrandall1471 can send the complex example he tried, as it based on client work.

In both of my examples, I store the decision variables in separate series. If I want to do a constraint that has both data and decision variables in it, I then concat the DataFrame and the Series that has decision variables. Then I can use the eval syntax.

Model/data separation is really important. That's one of the (few) things modeling languages got right.

Dr-Irv commented 1 year ago

@simonbowly I'm in the partner webinar, and they showed an example using the existing API. Have you decided to keep that, or take in our suggestions for making pd_add_constrs() and pd_add_vars() at the top level, and not mixing data and variables?

simonbowly commented 1 year ago

I am partway through the additions at the moment, but no documentation done yet so I couldn't give a complete example for the partner talk.

Current thinking is that we'll allow both styles, since for some examples the accessors approach looks useful, but we will definitely have the global functions implemented as you suggested.

simonbowly commented 1 year ago

@Dr-Irv one more open question on that actually - with the top level pd_add_vars and pd_add_constrs implemented, is there any use at all for the series accessors pd.Series.grb.add_vars and pd.Series.grb.add_constrs?

I think from your previous comments you would be ok with removing them? We would keep only the dataframe and index accessor methods for specific situations (e.g. method chaining). Series accessors would be restricted to vectorized operations on the Var/Constr objects that they hold.

simonbowly commented 1 year ago

New functions added, documentation to come

>>> import pandas as pd
>>> import gurobipy as gp
>>> from gurobipy_pandas import pd_add_constrs, pd_add_vars
>>> model = gp.Model()
>>> x = pd_add_vars(model, pd.RangeIndex(5))
>>> y = pd_add_vars(model, pd.RangeIndex(5))
>>> pd_add_constrs(model, x, '<', y)
0    <gurobi.Constr *Awaiting Model Update*>
1    <gurobi.Constr *Awaiting Model Update*>
2    <gurobi.Constr *Awaiting Model Update*>
3    <gurobi.Constr *Awaiting Model Update*>
4    <gurobi.Constr *Awaiting Model Update*>
dtype: object

Dr-Irv commented 1 year ago

@Dr-Irv one more open question on that actually - with the top level pd_add_vars and pd_add_constrs implemented, is there any use at all for the series accessors pd.Series.grb.add_vars and pd.Series.grb.add_constrs?

I think from your previous comments you would be ok with removing them? We would keep only the dataframe and index accessor methods for specific situations (e.g. method chaining). Series accessors would be restricted to vectorized operations on the Var/Constr objects that they hold.

IMHO, you shouldn't have the accessors for add_vars and add_constrs on DataFrame or Index either. One reason is that you don't get type checking. Another is because it's not really an "accessor", i.e., you're not getting a property - you're creating a new object.

To include them on Series is all about how you want the design to appear. For completeness, if you are going to have the accessors for DataFrame and Index, then you should have them for Series as well.

Also, from an examples perspective, you should choose one style and stick to it, because people will follow that style as evidence of "best practice". I think both @rrandall1471 and I think that add_vars and add_constrs should not be on the accessors, nor should the examples demonstrate using them that way.

I realize that the latter statements are something we disagree about.

simonbowly commented 1 year ago

Thanks Irv. Yes we disagree on the last point (the accessors have just proven too useful to discard for me) but I appreciate you making us aware of the potential downsides. And I take your point re: consistency.

FWIW, creating new objects via accessors seems a very natural usage to me: string accessors apply formatting to create new strings, pdvega creates plots by combining dataframe columns, geopandas applies transformations, etc

Thanks again for the feedback!

Dr-Irv commented 1 year ago

FWIW, creating new objects via accessors seems a very natural usage to me: string accessors apply formatting to create new strings, pdvega creates plots by combining dataframe columns, geopandas applies transformations, etc

Point taken.

If your plan is to provide both ways - at the accessor level and at the top level, then we'll most likely just use the top level method in order to maintain model/data separation. The other advantage is typing support.

I just did a simple model the other day using gurobipy-pandas and the lack of typing support for those 2 main methods slowed down my development. Had to pull up the online manual to get all the parameters right.

simonbowly commented 1 year ago

I just did a simple model the other day using gurobipy-pandas and the lack of typing support for those 2 main methods slowed down my development.

Yep, will follow up on typing in #41

Gurobi / gurobipy-pandas

Add a top level function pd_add_constr(model, lhs_series, sense, rhs_series) #42