RosettaCommons / binder

Binder, tool for automatic generation of Python bindings
MIT License
321 stars 66 forks source link

Setup binder as a Python package and publish pre-built wheels to PyPI #303

Open haiiliin opened 5 months ago

haiiliin commented 5 months ago

Description of the implementation

haiiliin commented 5 months ago

Linux build failed because it is built in a docker container. Part of (the python directory) the project was copied in the container and the symbolic link for the binder source broke.

haiiliin commented 5 months ago

Looks good now

haiiliin commented 5 months ago

Notes on publishing to PyPI: please give me a PyPI account name so I can invite you as a collaborator for the cppbinder project. You can create an API token in PyPI and add it as a secret PYPI_API_TOKEN in the repository settings.

haiiliin commented 5 months ago

@lyskov Can you take a look at this?

lyskov commented 5 months ago

@haiiliin this is interesting idea, let me think about this!

Also, i see that you added LLVM as submodule. Two thought on this:

Could you please let me know if you tested pre-build wheels on Linux distros with different libstdc++ versions? (different from the one you build the package). In the past this was an issue so i am suspecting it still is.

Thanks,

lyskov commented 5 months ago

@haiiliin to clarify:

Could you please let me know if you tested pre-build wheels on Linux distros with different libstdc++ versions? (different from the one you build the package). In the past this was an issue so i am suspecting it still is.

-- particularly that the code generated by Binder (say when running self test involving C++ std lib) is indeed compilable on the target systems.

Thanks,

haiiliin commented 4 months ago

This reason for not using the official LLVM project is that we have to create a symbolic link to the binder source inside the LLVM project and then add binder as a sub project in LLVM. If using the official LLVM project, when someone clone the binder project, the command to install the package locally pip install ./python will fail. But there maybe a solution for this that we can run some scripts to create the symbolic links before building the package in scikit-build-core. But it requires administrator access to install the package locally on windows.

  • It might be best to move all of this to a separate repository that have both Binder and and LLVM repos as submodules. That way it will be clear separation of responsibility. Thoughts?

The problem is that when a release is published in this project, we have to update submodules in another repository and then publish a same release again which makes it quite annoying.

Could you please let me know if you tested pre-build wheels on Linux distros with different libstdc++ versions? (different from the one you build the package). In the past this was an issue so i am suspecting it still is.

I think cibuildwheel (which uses the manylinux docker images to build wheels on different archs and glibc versions) will take care of this to make the wheels to support most linux distributions.

As said in manylinux project:

Building manylinux-compatible wheels is not trivial; as a general rule, binaries built on one Linux distro will only work on other Linux distros that are the same age or newer. Therefore, if we want to make binaries that run on most Linux distros, we have to use an old enough distro.

Supported distributions and glibc versions for the wheels of the images can be found in the manylinux project.

By default, manylinux2014 image (can be customized) is used, it is based on CentOS 7 with GCC 10 tool chain, and built wheels should be compatible with other distrubutions with glibc 2.17+.

If it is not compatible or pip cannot find compatible wheels on the target system, one can always build the package locally instead of using pre-built wheesl:

pip install --no-binary=cppbinder cppbinder

I only used the wheel built on ubuntu-22.04 using GitHub actions and tested on WSL ubuntu-24.04.

haiiliin commented 4 months ago

By default, cibuildwheel only builds x86_64 and i686 wheels on linux, if you think other wheels need to be built, we can also build aarch64ppc64les390x wheels

See here: https://cibuildwheel.pypa.io/en/stable/options/#archs

and here for the built wheels on PyPI: https://pypi.org/project/cppbinder/#files

lyskov commented 4 months ago

Great, - i see that you already pushed some of the binaries to PyPI which makes things easier to test. I have tried running Binder from these binaries on various Linux distos and so far things working!

One issue that i noticed is that it looks like right now wheel packages contain only the Binder executable and no include files. This create rather annoying usability issues since user will have to manually setup all include path for system (ie C++ std lib) include. Have you consider packing includes alongside with Binder Binaries? Steps to reproduce if you curious:

  1. run ubuntu:20.04 docker image
  2. install python3, python3-pip packages
  3. pip install cpp binder
  4. clone binder and pybind11 into /root
  5. cd /root/binder/test && ./self-test.py --binder=which binder --pybind11 /root/pybind11/include/ --accept

re submodules: i have give this issue some serious consideration and i see a few tech issues for current setup:

The problem is that when a release is published in this project, we have to update submodules in another repository and then publish a same release again which makes it quite annoying.

-- this is indeed and inconvenience but i think there is an easy way to fix it. How about if this new project will not have a submodules but rather do a fresh clones of both LLVM and Binder before builds? One particular issue with current setup is that adding LLVM as submodule is problematic (due to bulk size that it adds) for all projects who do not really need this functionality. This will became particularly an issue for projects who already have Binder as submodules (along other submodules) (my main project is setup this way and i know at least few more who have similar setup so this is a real use cases) which will require them to deal with recursive submodules initialization and how to workaround it.

-- another issue is chain of responsibility, right now Binder repo is clearly about Binder and not various way we can package it. This feels like right approach to me (ie keep things separated when possible). Having PyPI related things inside Binder repo while convenient at some points also merge maintenance of Binder and Binder-wheels-in-PyPI together which i do not think is a good move in the long run. For instance if we have this project separated then person who maintain wheels does not need to be the same person who maintain Binder (for example it could be you).

I guess in my mind PyPI packages is distribution system similar how one might create package for Debian, MacPorts or Spack. So lets keep this things separated unless there is a good tech point why they should not.

Thanks,

haiiliin commented 1 month ago

The LLVM project is now cloned at build time