Provide a proper wheel instead of a tarball

mattip commented 2 years ago

This repo generates and uploads a tarball to be used by NumPy and Scipy. It would be nice to generate a proper wheel instead. I am not sure where the wheel would install the shared object. As far as I know there is no agreed-upon place that can be used for shared objects. For instance, when installing a wheel with pip install --user on linux, the python modules are put into ~/.local/lib/site-packages/pythonX.Y and scripts into ~/.local/bin. There is no equivalent to /usr/local/libs for shared objects that the runtime ld loader will probe.

One option would be that the shared object would be put into site-packages/openblas. NumPy/SciPy wheels could then have a dependency on this new package, and would load the shared object via anRPATH directive which would be part of the build scripts.

rgommers commented 2 years ago

Here is my understanding: the key part is the install prefix. Everything is relative to that. In your example above, the prefix is .local. Then:

Python packages go into <prefix>/lib/pythonX.Y/site-packages
Executables (/ scripts) go into <prefix>/bin
Shared libraries like libopenblas.so go into <prefix>/lib
Headers go into <prefix>/include

Python controls that through "install schemes". See https://docs.python.org/3/library/sysconfig.html#installation-paths

What I am not sure about is the whether <prefix>/lib is on the loader library path - I guess not. The way this works in conda envs is that it is on the path, packages are made relocatable so everything is relative to <prefix> by conda-build. Other systems work like that too, the prefix concept is not unique to Python. The problem with wheels is the ad-hoc toolchain here. If this is indeed right, a probably not-so-difficult solution is to use the same _distributor_init.py mechanism as numpy/scipy are already using, and just change or add to the path being managed there.

mattip commented 2 years ago

If we keep with the _distributor_init.py route for adding a LD_LIBRARY_PATH or for pre-loading the shared object into the executable, we don't really need to use the <prefix>/lib library. We could move the make_init() code to create the _distributor_init.py file into this repo, and the code would do something like

try:
    from openblaslib import load_shared_object
except Exception:
    # For cases when openblas lib is removed or overridden by another option
    warning.warn("no openblaslib found, using system-provided blas library")

Then load_shared_object would be a version-specific routine to provide the runtime hooks.

Additionally, there would be two build-time functions:

make_init discussed above, and
a discovery function called by the build system somewhat like pkg-config (or maybe a pkg-config plugin?)

rgommers commented 2 years ago

If we keep with the _distributor_init.py route for adding a LD_LIBRARY_PATH or for pre-loading the shared object into the executable, we don't really need to use the <prefix>/lib library. We could move the make_init() code to create the _distributor_init.py file into this repo, and the code would do something like

We don't need to, but it simply seems like the right place?

Looking a bit closer, the _distributor_init.py file is handling only Windows; I guess we're using RPATH on Linux et al. The goal here should be to change as little as possible I think:

move the current libopenblas.so from .libs to a separate wheel, which I still think should unpack to <prefix>/lib
update _distributor_init.py and RPATH mechanism to point to the changed directory (maybe the latter is automatic already, through auditwheel?)
add a runtime dependency on the new package to pyproject.toml and the docs

That should be all AFAICT.

rgommers commented 2 years ago

a discovery function called by the build system somewhat like pkg-config (or maybe a pkg-config plugin?)

This I don't understand, it should not be needed (and sounds fragile). The build system already does discovery with pkg-config for Meson, and with numpy.distutils.system_info (and site.cfg if needed) for numpy.distutils. All we're changing here is the location that the exact same libopenblas.so can be found, so it should not require any new mechanism.

mattip commented 2 years ago

We can leave the discussion of hooks for later, and keep the current build system in place. It will mean that for NumPy's CI, we will need to copy the relevant libraries to /usr/local/lib.

Why do you think <prefix>/lib is the correct place for shared libraries? For instance pypa/auditwheel#89 (which was the issue and PR to put openblas into numpy.libs) did not consider it. Packages like wxpython, pyside2, torch, and tensorflow do not use <prefix>/lib, rather they put shared objects in the package directories. The numba package depends on llvmlite which puts a shared object into site-packages/llvmlite/binding.

Another complication: the typical auditwheel/delocate workflow to "fix" wheels is:

find all dependent shared objects that are not part of a "standard system" (for the NumPy wheel on ubuntu this is openblas, gfortran, quadmath)
copy them into the wheel
use RPATH directives to make sure the runtime loader can find them

This all changes since step 2 is not relevant. How do we tell auditwheel/delocate to avoid step 2? The other packages, besides numba, do not depend on a secondary package with a shared object. I wonder how numba does this.

rgommers commented 2 years ago

Why do you think <prefix>/lib is the correct place for shared libraries?

I looked at https://github.com/python/cpython/blob/main/Lib/sysconfig.py#L26-L38, which has the relevant locations. <prefix>/include is a thing there, <prefix>/lib is not as a separate entry (but read on). I guess the reason it's not is that, well, no one is really meant to put non-Python stuff in a wheel. The problem with site-packages is that that's specific to a Python version, and we'd like a single libopenblas.so that can be used from different Python versions. The one candidate for that seems to be the 'data' location, which is {base}. This is the directory containing include/bin/lib/lib/pythonX.Y/site-packages. So if in the wheel something is in data/lib, it drops into the right place. This is what the cmake package does for example (with data/bin/ for its executable, and also other such locations like share/ and doc/.

Packages like wxpython, pyside2, torch, and tensorflow do not use <prefix>/lib, rather they put shared objects in the package directories. The numba package depends on llvmlite which puts a shared object into site-packages/llvmlite/binding.

These are all Python packages with Python code inside, in addition to shared libraries.

How do we tell auditwheel/delocate to avoid step 2?

I don't know if they have this feature. Worth asking about this (and the topic in general) in the packaging Discourse perhaps?

mattip commented 2 years ago

I think the current direction from the auditwheel discussion is to make the openblas wheel pre-load the supplied shared object when importing openblas. So the mechanism looks like:

numpy wheels write a _distributor_init.py to import openblas
openblas.__init__.py import preloads the shared object on macos and linux (either via ctypes or via the Python C-API). On windows preloading does not work, but windows wheels are not subject to auditwheel restrictions so a little more flexibility is available.

Other distributors (debian, conda, fedora) do what they have always been doing.

rgommers commented 2 years ago

There should be no issue on Windows I think. If libopenblas.dll is already loaded (by calling a function from it via a function in Python through the CPython C API (triggered by import openblas), things should work I think.

isuruf commented 2 years ago

This is a ad-hoc version of pynativelib? https://mail.python.org/pipermail/wheel-builders/2016-April/000090.html

rgommers commented 2 years ago

@isuruf kinda, I'd say it's a subset, and only aimed at de-duplicating a vendored library from numpy/scipy wheels. Besides the space saving, it avoids known issues with having two copies of openblas loaded at the same time.

The pynativelib proposal is much more ambitious, dealing with library renaming, ABI compatibility/versioning, and build system integration. No one seems to have the energy to push that forward. More importantly, I'm not convinced that implementing pynativelib is a good idea from a social perspective. Giving how poorly suited the author-led PyPI model is to dealing with native code/libraries, I think that putting non-Python libraries in wheels is a last resort - it should only be done reluctantly when there's no better alternative. libopenblas is in that category, we kinda have to at some point (xref https://github.com/scipy/scipy/issues/15129). OpenMP may be similar (right now scikit-learn is vendoring it, but that doesn't scale). But in general, when you need a lot of compiled dependencies, it's better to direct users to other packaging system. E.g., for the geospatial stack, users should use conda-forge or a Linux distro - sticking GDAL, proj4j and the many other things it needs into wheels would be a very unhealthy idea.

ogrisel commented 1 year ago

Coming late to the party: if we were to also set a similar openmp runtime wheel for the scipy stack, then we could make openblas depend on that (instead of using it's own internal threadpool) and as a result, libraries such as scikit-learn with code that iteratively call into openmp parallel section and openblas parallel sections would no longer suffer from the bad performance degradation caused by the spinwaiting threads (working hypothesis) as documented here: https://github.com/OpenMathLib/OpenBLAS/issues/3187

rgommers commented 1 year ago

@ogrisel that seems very much reasonable - if we can use an OpenBLAS wheel we can also use an OpenMP wheel. However, we identified one blocker to actually using this wheel from released NumPy/SciPy packages, due to PyPI's metadata model - see https://github.com/numpy/numpy/issues/24857.

MacPython / openblas-libs

Provide a proper wheel instead of a tarball #86