Bravos-Power / pyoframe

Rapidly formulate huge optimization models
https://bravos-power.github.io/pyoframe/
MIT License
1 stars 1 forks source link

Build direct interface #57

Open metab0t opened 1 month ago

metab0t commented 1 month ago

Hello, @staadecker!

I am the author of PyOptInterface, an efficient modeling interface for mathematical optimization in Python. Its performance is quite competitive compared with existing solutions (faster than vendored Python bindings of some optimizers).

As a researcher of power system, I deeply understand the need to construct large-scale optimization models efficiently. PyOptInterface provides user-friendly expression-based API to formulate problems with different optimizers, and the lightweight handles of variables and constraints can be stored freely in built-in multidimensional container or Numpy ndarray or Python dataframes (as you like).

I think that PyOptInterface might be a good abstraction layer for your package to handle different optimizers. You can try out our package and evaluate its performance (both memory consumption and speed) if you are interested, and we welcome feedbacks and suggestions in any form.

metab0t commented 1 month ago

We believe there are fundamental limitations to the file-based I/O as pointed out by comment of developer of JuMP.jl. For example, file-based I/O makes the following advanced features nearly impossible: incremental modification and re-solve, extensible solver-specific attributes, on-demand query of solutions. Large file I/O is also slower than in-memory operations.

staadecker commented 1 month ago

Hi @metab0t!

Thank you for bringing PyOptInterface to my attention! Your approach, if I understand correctly, of calling Gurobi's C API directly is very neat! Great job implementing that as I imagine it was non-trivial to get the Python-C bindings working properly.

Pyoframe is built on polars a Rust-based dataframe library that follows the Apache Arrow DataFrame format (and not numpy's format). So one issue I foresee with building off of PyOptInterface is the conversion from Polars to your C++ API (which might be slow?). I think long-term this could be a good goal as I agree file-based I/O is an inefficient way to build models. However, for now, I think file-based IO is good enough: our polars-based writer is extremely fast (~10s for very large models), Gurobi reads in the model as well extremely quickly (~10s for very large models) and we don't use files to read back the results.

Incremental modification, re-solve, etc. are rather niche cases imo although something I'd like to support on the long-term at which point PyOptInterface might make a lot of sense.

If you have an easy way to integrate Polars dataframes with your library do let me know as it would be great to support 4 solvers. However, I would guess such an integration is non-trivial and such a project would need to wait. Let me know! (Also happy to setup a call to discuss).

Thanks for reaching out!!

metab0t commented 1 month ago

Thanks for your explanation!

I think that it would be not difficult to integrate Polars DataFrame with PyOptInterface. Variables and Constraints in PyOptInterface are just lightweight Python objects, and they can be stored as polars.Object column.

There is no need to store UB, LB, RC of variables because they are stored internally by Gurobi and can be queried on demand.

The file-based IO can be skipped because we have added them to the Gurobi model once they are created. Just call model.optimize() and the solution can be queried using the Variable and Constraint handles we previously stored in Polars DataFrame directly (to skip the io mapping process as well).

I will give a brief example later.

metab0t commented 1 month ago
import polars as pl
import pyoptinterface as poi
from pyoptinterface import gurobi

model = gurobi.Model()

# Create a DataFrame
df = pl.DataFrame({
    "X": [1, 2, 3],
    "Y": [4, 5, 6],
    'lb': 0.0,
    'ub': 2.0
})

def addvar(lb, ub):
    return model.add_variable(lb=lb, ub=ub)

df = df.with_columns(
    pl.struct(["lb", "ub"]).map_elements(lambda x: addvar(x["lb"], x["ub"]), return_dtype=pl.Object)
    .alias
    ("Variable")
)

vars = df["Variable"]

model.add_linear_constraint(poi.quicksum(vars), poi.Geq, 1.0)

obj = poi.quicksum(v * v for v in vars)
model.set_objective(obj)

model.optimize()

df = df.with_columns(
    pl.col("Variable").map_elements(lambda x: model.get_value(x), return_dtype=pl.Float64).alias(
        "Value")
)

print(df)

@staadecker This is a simple example to combine PyOptInterface and Polars to solve a QP problem.

staadecker commented 1 month ago

Thank you @metab0t !

I don't think we'd want to change the expression generation code over to poi.quicksum as the whole benefit of this library is the rapid creation of very large expressions using polars. Additionally, I'd be afraid that .map_elements would be quite slow (perhaps even slower than the fileIO).

In any case, I'm currently swamped with work so I need to put this on hold.

metab0t commented 1 month ago

I have played with Polars and find that its support for Python object is not complete https://github.com/pola-rs/polars/issues/10189

The design of Pyoframe is quite neat. Constraint.lhs.data and Variable.data are compact polars.DataFrame to store their terms and indices, which makes it easy for a possible switch in the future.

In general, using DataFrame to represent multidimensional indices and their sparse combination is a great choice. I remember the benchmark of GAMS and response of JuMP.jl where using DataFrames.jl improves the performance significantly. https://github.com/Gurobi/gurobipy-pandas is also an interesting project to use pandas.DataFrame as container of optimization.

staadecker commented 1 month ago

@metab0t thank you, I'm glad you like it :)

Before building the library I actually tried to do something simple like gurobipy-pandas with polars but due to Python objects not being fully supported I couldn't store Gurobi Python expressions in a dataframe as gurobipy-pandas does.

metab0t commented 1 month ago

The support for Python object seems not to be the priority of Polars, otherwise a similar API like gurobipy-pandas will be easy to implement (to store persistent variables/constraints objects in DataFrame directly).

Besides, the expression system of PyOptInterface is quite fast to construct expressions with many terms. The core is implemented by efficient hashmap in C++.

I prepare an example based on the facility_problem in Pyoframe repo at https://gist.github.com/metab0t/c3c685a8b2ec1f14171772bd7bc7ea3e

On my computer, the result is:

Pyoframe elapsed time: 28.71 seconds
POI elapsed time: 14.48 seconds
staadecker commented 1 month ago

Very neat comparison. Do you have a breakdown of where the time is being taken in PyoFrame (expression building vs io)?

metab0t commented 1 month ago

I have updated my gist to report time of Pyoframe in detail.

The time spent on expression building, write LP file and read LP file is approximately 1:1:2. So expression building occupies 25% time and file io occupies the other 75% time.

staadecker commented 1 month ago

Very neat, this confirms that file io is not ideal and that when I have time I should build a direct interface, perhaps using PyOptInterface. For context, expressions are stored in a "narrow" format where each row is a term and there is a column for the term's coefficient, and another with an ID to indicate the variable. Would that be something easily converted to your API? I'm thinking it is at that level that I'd want to pass things off to C.

staadecker commented 1 month ago

The file based IO also requires a lot of code (i.e. all of io.py and io_mappers.py) so it would be great to get rid of it (we can always use gurobi to generate the .lp file for inspection).

metab0t commented 1 month ago

The representation of expression is OK.

In fact, the variable in PyOptInterface is a thin wrapper of its ID, and the linear expression is two vectors representing the coefficients and indices of variables.

Storing variable object (from PyOptInterface or gurobipy) directly in Polars is not recommended because Polars supports Object poorly.

You can build ONE big array to store all the variables in the model and the variable id points to the array. When you want to add the constraint to the model, just traverse all rows and construct the expression object.

By the way, PyOptInterface supports writing the model to LP/MPS files as well. We use the native C API provided by gurobi and the output should be identical with gurobipy.