Gurobi / gurobipy-pandas

Convenience wrapper for building optimization models from pandas data
https://gurobipy-pandas.readthedocs.io/en/stable/
Apache License 2.0
83 stars 15 forks source link

Add typing support #41

Closed Dr-Irv closed 1 year ago

Dr-Irv commented 1 year ago

Installing gurobipy-pandas, gurobipy-stubs and pandas-stubs, the revealed types are funky:

from typing_extensions import reveal_type
import pandas as pd
import gurobipy as grb
import gurobipy_pandas

df = pd.DataFrame({"x": [1, 2, 3]})

reveal_type(df.grb)

s = pd.Series([1, 2, 3])
reveal_type(s.grb)

ind = pd.Index([1, 2, 3])
reveal_type(ind.grb)

With mypy, you get:

tsttypes.py:4: error: Skipping analyzing "gurobipy_pandas": module is installed, but missing library stubs or py.typed marker
tsttypes.py:4: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
tsttypes.py:8: note: Revealed type is "pandas.core.series.Series[Any]"
tsttypes.py:11: note: Revealed type is "Any"
tsttypes.py:14: error: "Index" has no attribute "grb"
tsttypes.py:14: note: Revealed type is "Any"

With pyright, you get:

tsttypes.py
  tsttypes.py:8:13 - information: Type of "df.grb" is "Series[Unknown]"
  tsttypes.py:11:13 - information: Type of "s.grb" is "Unknown"
  tsttypes.py:14:17 - error: Cannot access member "grb" for type "Index"
    Member "grb" is unknown (reportGeneralTypeIssues)
  tsttypes.py:14:13 - information: Type of "ind.grb" is "Unknown"
simonbowly commented 1 year ago

Originally posted by @Dr-Irv https://github.com/Gurobi/gurobipy-pandas/issues/12#issuecomment-1242065039:

Another thing to consider is to provide typing stubs for the accessors. We're now publishing pandas-stubs, and the way that Series.str is hooked in is via a StringAccessor. Problem for you is that you can't add an accessor to the stub. But I think there is a way for you to do this.

If the type of something like Series.grb.pd_add_vars() became GRBVarSeries, where GRBVarSeries is a subclass of Series, you could then add a typing stub for GRBVarSeries.grb, and then type check the various methods on the accessors.

MIght be nice to have GRBVarSeries (and maybe GRBConstrSeries, GRBLinExprSeries, etc.) so you know the type of the series via its naming convention.

Could also consider making a Generic type of GRBSeries[gurobipy.Var] . Note that the pandas-stubs are using Generic on Series to make the typing work better.

simonbowly commented 1 year ago

This looks really tricky, and might have to go on the back burner. I added the type hints to the existing code to help with documentation, but this seems like an impasse:

tsttypes.py:14: error: "Index" has no attribute "grb"

@Dr-Irv's suggestion above may work for the series produced by gurobipy_pandas (series of vars & constraints) but I can't see a way around the entry point problem. It doesn't seem to be possible to augment the pandas type stubs so that mypy and others are aware that Index.grb and friends exist.

Closest I've found is partial type stubs in PEP561 but this only works at the module level, we can't selectively add an attribute.

Dr-Irv commented 1 year ago

I started a discussion on the typing discussion list. See https://mail.python.org/archives/list/typing-sig@python.org/thread/UOIVXOJA2PIEYF3XB37HBI2MAJ4XYNUI/

simonbowly commented 1 year ago

Current state: basic typing for the top-level add_vars and add_constrs is there. We don't yet have a specialised type to enable type-checking of attribute statements such as x.grb.X where x is a series of vars.

Dr-Irv commented 1 year ago

I don’t think you will be able to provide types for any attributes sitting under grb. That’s because pandas provides the types for Series in pandas-stubs and you can’t add a type for an attribute in an existing library. I brought this up In the typing discussion group here. https://mail.python.org/archives/list/typing-sig@python.org/thread/UOIVXOJA2PIEYF3XB37HBI2MAJ4XYNUI/

simonbowly commented 1 year ago

Agreed - I was thinking of your earlier comment 'return a GRBVarSeries' or something similar. But it seems like an abuse of the type system since at runtime we won't actually return a subclass of Series. So will wait on an update on the thread above before revisiting this.

Dr-Irv commented 1 year ago

You could change the implementation to return a GRBVarSeries that is a subclass of Series. Besides the typing advantage, you'd also be set up if you should ever use the ExtensionArray paradigm where a GRBVarSeries would "embed" a GRBVarArray

simonbowly commented 1 year ago

We discussed this already, and won't go down this route. Subclassing pandas series is really a last resort. Accessors & extension arrays exist primarily so extensions can avoid subclassing Series. We will likely implement the extension array at some stage for performance reasons, but this doesn't require creating a subclass, so I don't see the advantage there.

While I understand the advantages of typing, IMO it's meant to be an optional component of the language, and creating implementation headaches just to support typing seems like a bad route to go down. Better to wait for partial typing support.

simonbowly commented 1 year ago

Closing this as the typing overloads are done for the free functions add_vars and add_constrs. Nothing more we can do for the accessors for now.