Interfaces for network models

simonbowly commented 1 year ago

Meta issue to make sure we have consistent APIs between network-y mods #39 #36

Networkx is very clean for this, as we can have node and edge attributes and all the data for a mod is bundled into one python object. We probably want to avoid networkx/igraph/others as dependencies, but only if we can find just as clean an API for network models using the existing numpy/scipy/pandas dependencies.

For weighted or unweighted graphs we use scipy sparse arrays (e.g. mwis and matching):

# Graph adjacency matrix (upper triangular) as a sparse matrix.
g = nx.cubical_graph()
adjacency_matrix = sp.triu(nx.to_scipy_sparse_array(g))

# Vertex weights
weights = np.array([2**i for i in range(8)])

# Compute maximum weighted independent set.
mwis = maximum_weighted_independent_set(adjacency_matrix, weights)

however this doesn't quite suit networks which need multiple attributes per edge. We could pass multiple arrays, but this is unweildy as the user has to build two scipy objects with duplicated data for the edge pattern. The alignment between sparse indexes and positions in the demand array is not too nice either, as it's restricted to 0 ... n-1 node labels.

cost = ... # sparse array
capacity = ... # sparse array
demand = ... # dense array
min_cost_flow(cost, capacity, demand)

I also wrote a network flow function for use in bipartite matching, which uses dense arrays to pass all the edge source/target indices, costs & capacities. It's meant as a low-level utility for other mods to use though, it would be fast but not a nice API:

https://github.com/Gurobi/gurobi-optimods/blob/ae7e1d8866f85ad8aa034d0af7f8a26c5117ac9c/src/gurobi_optimods/network_util.py#L11-L18

So I think pandas is the better alternative here, with input data as below. It's easy to do a direct formulation of a flow problem using gurobipy-pandas as well, see networkflow.md

edge_data = pd.DataFrame([
    {"source": 0, "target": 1, "capacity": 16, "cost": 0},
    {"source": 0, "target": 2, "capacity": 13, "cost": 0},
    {"source": 1, "target": 2, "capacity": 10, "cost": 0},
    {"source": 2, "target": 1, "capacity": 4, "cost": 0},
    {"source": 1, "target": 3, "capacity": 12, "cost": 0},
    {"source": 3, "target": 2, "capacity": 9, "cost": 0},
    {"source": 2, "target": 4, "capacity": 14, "cost": 0},
    {"source": 4, "target": 3, "capacity": 7, "cost": 0},
    {"source": 3, "target": 5, "capacity": 20, "cost": 0},
    {"source": 4, "target": 5, "capacity": 4, "cost": 0},
])

node_data = pd.DataFrame([
    {"node": 0, "demand": -23},
    {"node": 5, "demand": 23}
])

@torressa @stevedwards what do you think (for the final version, ok to continue hacking with networkx for the time being)? It's also fine to use networkx in the docs examples, either to generate networks or produce figures. @maliheha did this for MWIS; her examples generate a networkx graph and convert to scipy using networksx/numpy functions. It's similarly straightforward for pandas, see to_pandas_edgelist which gives the same structure as edge_data above from a graph with edge attributes.

torressa commented 1 year ago

I am happy with using pandas as our main external interface for problems with many edge attributes. It's cleaner than adding a new dep or using a function with many inputs.

To make it super easy for people to use I think we should aim to transform stuff for people with different preferences to use (we can add these as optional deps: i.e. only define the function when the import is available).

For graph-based mods, we could have, for example (thanks to Python's lack of overloading),

graph_mod(data: pd.DataFrame, ...)
graph_mod_sp(G: sp.sparray, ...) (This can be the default one when pandas one is not needed)
graph_mod_nx(G: nx.DiGraph, ...) (optional dep)

This will really help peeps in the graph community try these out. Of course, as @ruthmair pointed out, some of these libraries will already have algorithms for these problems in place, but sometimes they will not!

simonbowly commented 1 year ago

Sounds good. You can overload the same function name if you want; just check the argument type. I've done that in bipartite matching. It looks ok but might start to look odd when e.g. networkx requires only one object as an argument but pandas/scipy require two.

simonbowly commented 1 year ago

We have a more consistent approach now: use scipy and pandas as much as possible, but also provide some documented way to pass a networkx graph. Closing.

Gurobi / gurobi-optimods

Interfaces for network models #47