ERGO-Code / HiGHS

Linear optimization software
MIT License
992 stars 184 forks source link

ENH: SciPy integration fixes #1987

Closed HaoZeke closed 4 weeks ago

HaoZeke commented 1 month ago

This is the final PR needed upstream for https://github.com/scipy/scipy/pull/21565

In the spirit of discussions (a long time ago now), meson-python is no longer used.

HaoZeke commented 1 month ago

Sorry @HaoZeke , but there's so much of this that we can't accept.

Thanks for the quick response @jajhall, hopefully we can work this out.

It's hard to know where to start.

let's consider the changes to

src/highspy/init.py

In redefining most of the highspy objects by inserting "Highs", you're making them inconsistent with the names of the corresponding C++ objects, and making breaking changes to the API

Why are you changing things like the enum kSimplexEdgeWeightStrategySteepestEdge to kSteepestEdge? Why simplex_constants to simpc. You're stripping meaning that's appears completely unrelated to the use of HiGHS by SciPy.

It reduces a lot of redundant typing, but these changes I can revert (will modify the SciPy call as well). IMO this design is cleaner, since the repeating the enum name in the variable is completely redundant; but, this is a design decision I'm not very invested in anyway.

For example, consider this change:

highspy.IisStrategy.kIisStrategyFromLpColPriority
highspy.HighsIisStrategy.kFromLpColPriority)

Here the IisStrategy bit is duplicated (for no real discernable reason AFAIK), while IisStrategy without Highs as a prefix is susceptible to errors arising from from highspy import *. Additionally, especially for these:

    "ObjSense",
    "MatrixFormat",
    "HessianFormat",
    "SolutionStatus",

They are generic enough that importing them forcibly on import of highspy (current behavior) might easily shadow user variables, and given that most of the other variables and classes use Highs*** as the naming scheme it seemed more natural. I should note that in general __init__.py files should be almost completely empty, and the heavy population of the namespace on import is a bad practice (but not one changed here).

Similarly, kSimplexEdgeWeightStrategySteepestEdge is tied to the enum HighsSimplexEdgeWeightStrategy so it can only ever be used as HighsSimplexEdgeWeightStrategy.kSimplexEdgeWeightStrategySteepestEdge which is really verbose, and there is no ambiguity to use HighsSimplexEdgeWeightStrategy.kSteepestEdge instead. It is very opaque to me why the design is currently emulating a ctype style verbosity and duplicating names. In particular, the C++ member names do not really have a bearing on the bindings, note that already many cases use free functions within the highs_bindings.cpp file to make the classes more ergonomic and Pythonic.

simplex_constants to simpc is a bit trivial, I was adding the remaining constants and was tired of typing it out everywhere, that probably is best renamed to the older name.

For the other cases, I believe this is a more Pythonic design, but, again, most users I know will only ever interact with the SciPy API, so the absolute naming of these isn't my main concern and I'm happy to revert to the duplicated names.

Why are you changing anything within the codebase of HiGHS - such as within CupdlpWrapper.h, cupldlp/*? I can see that some may relate to compiler warnings, and some of these may be OK for us to make.

I don't know what the change to src/parallel/HighsTaskExecutor.h achieves, but we are very wary of making any changes to the task scheduler.

These are all compiler warnings. In this PR anything outside highs_bindings and the meson.build is purely for the warnings.

Why can SciPy not exploit the fact that highspy is now in PyPI, and avoid any SciPy-specific build requirements?

This is likely not going to work, mostly because SciPy has no dependencies on anything other than numpy:

https://github.com/scipy/scipy/blob/05b8ef3544e7b7a59b70c1f7a4dc82243e56da77/pyproject.toml#L47-L50

More importantly, the SciPy interfaces have a different call and design from those of highspy, and this is very likely not going to change in the near future.

Note that we are considering removing the HiGHS meson build altogether, as it appears to be more trouble than it was ever worth.

I'm not sure I agree that there's any trouble arising from its inclusion, but if HiGHS really wants it gone, then we can continue to patch the scipy HiGHS fork (as has been done earlier) and I'm also willing to maintain a wrapdb recipe for HiGHS (that way it can be used as a meson subproject even if it isn't part of the main repository). This will need some discussion at the SciPy end, I'd prefer it to making changes to the fork of HiGHs.

TL;DR: Is this a fair summary of the only problems with this?

I will note that the integration (https://github.com/scipy/scipy/pull/21565) basically uses one file not used in highspy, highs_options.cpp but I'm also open to moving into SciPy completely, since it currently is only used there.

As noted before, for the third option I'd modify this to remove the meson.build and just put it in wrapdb. Would still need to fix the naming of variables and warnings.

jajhall commented 1 month ago

Consider this change:

highspy.IisStrategy.kIisStrategyFromLpColPriority
highspy.HighsIisStrategy.kFromLpColPriority)

Here the IisStrategy bit is duplicated (for no real discernable reason AFAIK), while IisStrategy without Highs as a prefix is susceptible to errors arising from from highspy import *

The duplication comes from kIisStrategyFromLpColPriority being an enum rather than an enum class in C++

As for

"ObjSense", "MatrixFormat", "HessianFormat", "SolutionStatus",

I don't understand how they can "shadow user variables" when they are used (for example) as highspy.MatrixFormat.kColwise. Surely the highspy. prefix makes them unique. I acknowledge that "SolutionStatus" is an enum, and might be given a prefix to make it more clearly identified with HiGHS.

I should note that in general __init__.py files should be almost completely empty, and the heavy population of the namespace on import is a bad practice (but not one changed here).

I'm not a Python user, so I can't comment on this. However, __init__.py was written by experienced Python users, so I guess that it was done for god reason.

jajhall commented 1 month ago

Why can SciPy not exploit the fact that highspy is now in PyPI, and avoid any SciPy-specific build requirements?

This is likely not going to work, mostly because SciPy has no dependencies on anything other than numpy:

https://github.com/scipy/scipy/blob/05b8ef3544e7b7a59b70c1f7a4dc82243e56da77/pyproject.toml#L47-L50

When I spoke to Matt recently, and communicated various concerns about the use of HiGHS by SciPy, he suggested that it would be possible to use the PyPI installation, and we pretty much agreed that it was the way to go.

More importantly, the SciPy interfaces have a different call and design from those of highspy, and this is very likely not going to change in the near future.

I don't understand this argument. Surely using the PyPI installation is just a different way to get access to highspy

rgommers commented 1 month ago

These are all compiler warnings. In this PR anything outside highs_bindings and the meson.build is purely for the warnings.

Why not split these off as separate PRs, maybe even one per C++ component or file, with the PR description spelling out what the warning was and that the fix changes no behavior and fixes the warning? The fixes should be generally useful.

They are generic enough that importing them forcibly on import of highspy (current behavior) might easily shadow user variables, and given that most of the other variables and classes use Highs*** as the naming scheme it seemed more natural. I should note that in general __init__.py files should be almost completely empty, and the heavy population of the namespace on import is a bad practice (but not one changed here).

@HaoZeke no part of this bit of reasoning is correct. Star imports are "use at your own risk" (you don't avoid logical names for it), empty __init__.py files are a good idea when starting a new project but tons of popular projects don't do this, and trying to make cosmetic changes makes everything else harder. Example: renaming highspy.MatrixFormat to highspy.HighsMatrixFormat seems just strictly worse - it's like asking numpy to change np.array to np.numpy_array. I strongly recommend dropping any non-essential changes like these.

Without the stylistic changes to highspy and with the build warning fixes split off, this should become a very reasonable and easy to merge PR I'd think.

Why can SciPy not exploit the fact that highspy is now in PyPI, and avoid any SciPy-specific build requirements?

This really is not an option, and honestly you wouldn't want this: it'd yield lots of support requests to HiGHS, including time-critical ones. As an example, we did a release with Python 3.13 wheels before 3.13.0 came out, so you'd have a very narrow window of ~2 weeks or so to do the same. As another example: you'd have to have support for free-threading Python (cp313t wheels) already.

The chance of SciPy accepting any runtime dependency other than NumPy in the next few years is quite low, so it's not at all a judgement from us on the quality of HiGHS (we're quite happy overall I'd say, HiGHS seems faster/better than its direct competitors).

When I spoke to Matt recently, and communicated various concerns about the use of HiGHS by SciPy, he suggested that it would be possible to use the PyPI installation, and we pretty much agreed that it was the way to go.

@mdhaber surely you didn't mean a hard dependency? Perhaps this was a misunderstanding, or you had something else in mind?

jajhall commented 1 month ago

Note that we are considering removing the HiGHS meson build altogether, as it appears to be more trouble than it was ever worth.

I'm not sure I agree that there's any trouble arising from its inclusion, but if HiGHS really wants it gone, then we can continue to patch the scipy HiGHS fork (as has been done earlier) and I'm also willing to maintain a wrapdb recipe for HiGHS (that way it can be used as a meson subproject even if it isn't part of the main repository). This will need some discussion at the SciPy end, I'd prefer it to making changes to the fork of HiGHs.

I'm already very unhappy that SciPy is using such an out-of-date version of HiGHS as

  1. It's causing us problems with a couple of major interfaces to HiGHS.
  2. As the performance of HiGHS improves and bugs are fixed, SciPy users won't benefit

Ultimately if SciPy continues to use an out-of-date version of HiGHS it will undermine our reputation.

Fortunately CVXPY will soon (https://github.com/cvxpy/cvxpy/issues/2568) call highspy directly rather than via SciPy.

Ideally your fork can be updated to the latest version of HiGHS, so you can have whatever Meson stuff you need there, rather than in the main HiGHS repo. However, if that's not possible, I'd rather keep the Meson build in HiGHS than have SciPy use an out-of-date version of HiGHS. We'll just have to observe that the Meson build isn't supported by HiGHS.

jajhall commented 1 month ago

Why can SciPy not exploit the fact that highspy is now in PyPI, and avoid any SciPy-specific build requirements?

This really is not an option, and honestly you wouldn't want this: it'd yield lots of support requests to HiGHS, including time-critical ones. As an example, we did a release with Python 3.13 wheels before 3.13.0 came out, so you'd have a very narrow window of ~2 weeks or so to do the same. As another example: you'd have to have support for free-threading Python (cp313t wheels) already.

Fair enough. As I said, I'm not a Python user.

@galabovaa is the one to comment on the pros and cons of this.

jajhall commented 1 month ago

@mdhaber surely you didn't mean a hard dependency? Perhaps this was a misunderstanding, or you had something else in mind?

No, @mdhaber didn't mean a hard dependency, but there seemed to be some interim dependency. Again, I'm not a Python user, so I've no insight into his argument.

jajhall commented 1 month ago

These are all compiler warnings. In this PR anything outside highs_bindings and the meson.build is purely for the warnings.

Why not split these off as separate PRs, maybe even one per C++ component or file, with the PR description spelling out what the warning was and that the fix changes no behavior and fixes the warning? The fixes should be generally useful.

Having these as separate PRs is my preference. Otherwise we have the job of sifting through this massive PR for the features that we're happy to accept.

rgommers commented 1 month ago

I'm already very unhappy that SciPy is using such an out-of-date version of HiGHS [...] Ideally your fork can be updated to the latest version of HiGHS

@jajhall I agree, we'd much rather have an updated version as well. The main goal of @HaoZeke's contributions is to make updates easier and faster, because we currently have complex Cython bindings to your C++ code and we'd like to drop those and instead directly use the pybind11-based bindings within HiGHS itself. That will make it much easier to update; the only things we'd then need are

  1. for code to be buildable on the platforms we support, and
  2. not emit compiler warnings during the build.

In case we encounter a problem, we can then update our fork if it's time-critical, and submit a PR to this repo for the permanent fix. That should get both HiGHS and SciPy the best of both worlds.

galabovaa commented 1 month ago

@rgommers thank you for your comments. It makes sense to use pybind11 directly and drop the cython bindings.

@HaoZeke and @jajhall, I agree that the compiler warnings should be done in a separate PR.

I understand that having a fork of highs for time-critical edits and pre-release python versions of HiGHS makes sense. This would help us to maintain highspy for future releases of python too.

I have a couple of questions:

  1. If pybind11 bindings are used directly, would scipy still use meson to build the highspy wheels?

  2. Are there any platforms you support, which are not currently supported by us? current platforms/architectures

It may be best if we could get HiGHS and scipy/HiGHS as close to each other as possible.

If meson would still be used to build highs, we would consider keeping the meson build files in our repo. Particularly, if this makes it easier to keep our repo and the fork as close as possible.

We now have a prototype modelling language in highspy, so we would only accept minimal changes to the code generating the bindings. They would have to be synchronised with the developer @mathgeekcoder.

I think a good starting point would be to strip this PR of all compiler warnings and to try to get scipy to call the highs bindings directly, with the least possible modifications to the bindings.

Any style changes should be removed and anything not strictly necessary for getting scipy to work with the bindings too.

Once this works, we can start to look at compiler warnings.

rgommers commented 1 month ago

If pybind11 bindings are used directly, would scipy still use meson to build the highspy wheels?

Yes, we use Meson. Note that we don't build separate wheels (that wouldn't really work), but that we essentially build with meson setup && meson compile and then fold the result into Python extension modules inside scipy.optimize._highs. So we extract the sources, build options, etc. from meson.build files within HiGHS, and rebuild them with the help of <100 lines of code in our own meson.build files. (for details, see scipy/optimize/_highs/meson.build in https://github.com/scipy/scipy/pull/21565).

Are there any platforms you support, which are not currently supported by us? current platforms/architectures

I think if you only consider wheels, then it's cp313/cp313t (the t variant is for free-threaded CPython) that are missing at the moment. They build just fine though, so adding them in your next release should be as easy as adding those tags to your cibuildwheel config.

Linux distros do build SciPy for other, more esoteric platforms (e.g., 32-bit Arm, MIPS, RISC-V) and we support that in SciPy itself by addressing bug reports. I don't see any HiGHS packages in Debian or Arch Linux at the moment (it's possible I missed it if it's under another name) - but if cvxopt will start relying on HiGHS then those will probably materialize. I don't expect too many difficulties there either.

It may be best if we could get HiGHS and scipy/HiGHS as close to each other as possible.

+1 completely agreed.

If meson would still be used to build highs, we would consider keeping the meson build files in our repo. Particularly, if this makes it easier to keep our repo and the fork as close as possible.

Thanks, that would be helpful!

galabovaa commented 1 month ago

Thank you!

I will add cp313(t) to our cibw config for the next release.

We will keep the meson build files.

I don't expect difficulties on the more esoteric platforms either, and we should start to support these in time, yes. The more we support, the better! It just may be tricky to reproduce issues if we don't have access to an esoteric machine. Hopefully, we should not have many to address.

For this PR, @HaoZeke, would you please modify so we follow the stages:

  1. get scipy to use pybind11 without modifications of highs_bindings or the core solver code here

  2. get the scipy fork as close as possible to this repo for easier future maintainence

  3. compiler warnigns

Thank you!

HaoZeke commented 4 weeks ago

Now that https://github.com/ERGO-Code/HiGHS/pull/2005 is up this can be closed (but there's a good discussion here to keep in mind for other contributions).