apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
174 stars 38 forks source link

python: Use meson-python instead of setuptools #644

Closed WillAyd closed 3 weeks ago

WillAyd commented 1 month ago

This PR originally started as a nanobind POC to see if we could convert from Cython over to that, but I scaled that back just to focus on the possiblity of porting the build system to meson (you can see some nanobind components in the history, if you cared to look)

The build system port may have some merits on its own. It is more tightly integrated with the build system of the C code, and has the advantage of being built in parallel. ~Technically it links to the main project by symlinking it as a subproject in the subprojects directory; if we cared to support source Python builds we could replace that symlink with a wrap file on distribution, or could even detect within the configuration if the symlink resolves. In the case it does not, we can fallback to a github release~

If the user wanted to develop locally, they could simply symlink the project root into the subprojects folder of Python (the git repo already does this). For source distributions, the wrap system can provide a fallback project to download and reference

If we choose to try and move off of Cython incrementally and use something like nanobind or even C extensions, I think Meson can handle those more capably than the setup.py script

paleolimbot commented 1 month ago

I know you're still working here, but just making sure you know about https://github.com/apache/arrow-nanoarrow/blob/main/ci/scripts/test-python-wheels.sh

Will this work with CUDA? I imagine it's possible to add LDFLAGS to get around this if needed.

I do think we have to support building from sdist since I believe that's how the conda-forge setup currently works.

(In general I think meson is way better than setuptools provided it has the same level of Python version and platform support...thank you for working on this!)

WillAyd commented 1 month ago

Will this work with CUDA? I imagine it's possible to add LDFLAGS to get around this if needed.

Definitely not an area of expertise for me, but I don't think Meson provides that many abstractions over CUDA at the moment

I do think we have to support building from sdist since I believe that's how the conda-forge setup currently works.

Yep should still be a part of this - when cloning the git repo if the arrow-nanoarrow subproject is there, meson should reference that. For source distributions, they should include the arrow-nanoarrow.wrap file which will download a copy of the project from a remote source (currently this GH branch, but ideally main or better yet the WrapDB)

WillAyd commented 1 month ago

Putting back in draft mode for now as there might be an upstream bug in meson we need to resolve https://github.com/mesonbuild/meson/issues/13746

...or my understanding is wrong. Either way, I think the way things stand now are a very close approximation of what this solution could look like

paleolimbot commented 1 month ago

Another outlet of this (or something in addition to this) could be something in examples: a Python package that uses nanoarrow using meson as the build system.

jorisvandenbossche commented 1 month ago

I don't really have a stake in this, but just to mention it: for pyarrow we are considering to switch from setuptools to scikit-build-core, and for maintenance overhead there might be some value in using the same build backend for the various python packages in the project (but in the end, we should also use the best tool in each case, and here there is less existing cmake compared to pyarrow, so using meson-python might be the simpler option)

paleolimbot commented 1 month ago

Given the dependency-freeness, probably either would work, and probably it would not be too hard to transition to one or the other if we need to for some reason! (We definitely need something better than setuptools, though, since the day is coming soon when we need zstd and lz4 to support IPC compression).

paleolimbot commented 1 month ago

I spent some time working with this to see if there was any easy way out of the "meson python doesn't make it easy to build a Python package in a subdirectory" problem, and I wonder if a slightly lower barrier to entry would be to keep the existing bootstrap.py behaviour (i.e., generate the single-file exports and .pxd and put them in python/vendor) and use Meson to build those instead of attempting to have nanoarrow as a subproject (getting nanoarrow to be a proper subproject could be a follow-up). That way we would get the benefit of meson-python as a build system (e.g., make it easier to transition away from Cython) without bending the build system to do something it wasn't designed to do. Thoughts?

WillAyd commented 1 month ago

Yea I think that or what @rgommers suggested (run a script during Meson's dist hook to vendor what we need) are the two options.

I'll take another look at this next week

WillAyd commented 4 weeks ago

OK I removed any of the subproject usage for now and went back to the bootstrap approach. This seems to work well when building in the repository, but the sdist installs are broken.

Need to look through that a bit more but am open to ideas. The main limitation is that Meson does not allow you to edit/create files in-source, so the vendoring of files naturally does not happen until build time. But that is probably too late for the sdist

paleolimbot commented 3 weeks ago

For building the sdist can you run the bootstrap manually before running python -m build --sdist?

Also make sure that you edit python-wheels.yaml appropriately so that we can make sure the wheels build/test correctly!

WillAyd commented 3 weeks ago

I think the problem is that meson does not allow you to generate files in the source tree at all, so when doing a commit-level integration you need the bootstrap script to write the files to the build directory, but when generating an sdist the build directory doesn't yet exist

paleolimbot commented 3 weeks ago

Yes, but we can always modify the build tree before building the sdist?

      - name: Build
        run: |
          cd python
          // stuff required to build sdist
          python -m build --sdist 
WillAyd commented 3 weeks ago

Hmm do you mean the build tree or the source tree? The build tree won't exist until after the sdist is created, and you can't be sure what the build directory will actually be named since it can be controlled by the user

paleolimbot commented 3 weeks ago

Sorry, the source tree!

(We're the only ones that ever build an sdist so we can copy whatever we want into the python/ directory before it gets packaged and ensure meson.build looks for it?)

WillAyd commented 3 weeks ago

we can copy whatever we want into the python/ directory before it gets packaged and ensure meson.build looks for it?

I don't think there is a way to do this. Meson is very strict about not letting you generate files in the source tree, so this either violates the requirements for commit-level integration, or I think we'd have to deploy some hacks to make it work in both settings.

So I may be leaning back towards the subproject approach that vendors the subproject when the sdist is created

paleolimbot commented 3 weeks ago

I am not sure I understand...the copying I'm talking about for the sdist happens before anything Python-related is invoked?

WillAyd commented 3 weeks ago

I think your proposal would work if you only ever installed from an sdist, but wouldn't work when building from this repo. Let me see if I can clarify.

Assume we had a structure like this:

src/
  nanoarrow.c
python/
  src/
    nanoarrow_c.pxd

The PR as it currently stands will copy the src/nanoarrow.c file into the build folder and reference it from there:

src/
  nanoarrow.c
python/
  src/
    nanoarrow_c.pxd
  builddir/  # generated at build time
    nanoarrow.c

What you are proposing is to generate an sdist before installing the package with a layout that may look something like:

src/
  nanoarrow.c
python/
  src/
    nanoarrow_c.pxd
  vendor/
    nanoarrow.c

However, I don't know of a simple way for Meson/ninja to resolve to a source file potentially in the source tree (when installing from an sdist) while falling back to looking for it in the build folder (when building in this repo)

paleolimbot commented 3 weeks ago

I am still not sure I understand but trust you! Feel free to disable the sdist generation for now (although we will need it before the next release).

It also may be that we need to do something here that meson-python is not built to do (which is fine...our current build system is very low maintenance!).

WillAyd commented 3 weeks ago

@rgommers maybe you have an idea on the Windows failure? Tried looking through upstream discussions but its a rather broad topic:

https://github.com/apache/arrow-nanoarrow/actions/runs/11559159131/job/32173258300?pr=644#step:6:184

Need python for x86_64, but found x86

AFAIU the container which cibuildwheel installs a 32 bit Python but the compiler being used is 64 bit?

rgommers commented 3 weeks ago

AFAIU the container which cibuildwheel installs a 32 bit Python but the compiler being used is 64 bit?

Yes indeed. The tl;dr is that if you want to build wheels for 32-bit Python on 64-bit Windows, Meson doesn't automatically configure MSVC in 32-bit mode for you (rationale: there could be non-Python library or executable targets, and why would those be 32-bit because somewhere else in meson.build a 32-bit Python is detected).

To fix it, explicitly configure 32-bit MSVC in your cibuildwheel-using job: https://github.com/PyWavelets/pywt/blob/361cc9a4c80bc2603fae99e151f2a391c0b376f9/.github/workflows/wheel_tests_and_release.yml#L204-L214

WillAyd commented 3 weeks ago

The only issue I see is that flatcc sources aren't vendored/not contained in the sdist.

The latest commit should add that functionality now, relying on Meson's own functionality to bundle subprojects as part of dist creation. It's a bit wonky that flatcc is bundled that way and the rest of the files are created by our custom script reading from a CMake file...but I'm not sure that can be resolved in this PR. Definitely would like to tackle that in follow up(s)

WillAyd commented 3 weeks ago

Thanks @paleolimbot for the review and @rgommers for all of the guidance!