Closed WillAyd closed 3 weeks ago
I know you're still working here, but just making sure you know about https://github.com/apache/arrow-nanoarrow/blob/main/ci/scripts/test-python-wheels.sh
Will this work with CUDA? I imagine it's possible to add LDFLAGS
to get around this if needed.
I do think we have to support building from sdist since I believe that's how the conda-forge setup currently works.
(In general I think meson is way better than setuptools provided it has the same level of Python version and platform support...thank you for working on this!)
Will this work with CUDA? I imagine it's possible to add
LDFLAGS
to get around this if needed.
Definitely not an area of expertise for me, but I don't think Meson provides that many abstractions over CUDA at the moment
I do think we have to support building from sdist since I believe that's how the conda-forge setup currently works.
Yep should still be a part of this - when cloning the git repo if the arrow-nanoarrow subproject is there, meson should reference that. For source distributions, they should include the arrow-nanoarrow.wrap
file which will download a copy of the project from a remote source (currently this GH branch, but ideally main or better yet the WrapDB)
Putting back in draft mode for now as there might be an upstream bug in meson we need to resolve https://github.com/mesonbuild/meson/issues/13746
...or my understanding is wrong. Either way, I think the way things stand now are a very close approximation of what this solution could look like
Another outlet of this (or something in addition to this) could be something in examples
: a Python package that uses nanoarrow using meson as the build system.
I don't really have a stake in this, but just to mention it: for pyarrow we are considering to switch from setuptools to scikit-build-core, and for maintenance overhead there might be some value in using the same build backend for the various python packages in the project (but in the end, we should also use the best tool in each case, and here there is less existing cmake compared to pyarrow, so using meson-python might be the simpler option)
Given the dependency-freeness, probably either would work, and probably it would not be too hard to transition to one or the other if we need to for some reason! (We definitely need something better than setuptools, though, since the day is coming soon when we need zstd and lz4 to support IPC compression).
I spent some time working with this to see if there was any easy way out of the "meson python doesn't make it easy to build a Python package in a subdirectory" problem, and I wonder if a slightly lower barrier to entry would be to keep the existing bootstrap.py
behaviour (i.e., generate the single-file exports and .pxd and put them in python/vendor
) and use Meson to build those instead of attempting to have nanoarrow
as a subproject (getting nanoarrow to be a proper subproject could be a follow-up). That way we would get the benefit of meson-python as a build system (e.g., make it easier to transition away from Cython) without bending the build system to do something it wasn't designed to do. Thoughts?
Yea I think that or what @rgommers suggested (run a script during Meson's dist hook to vendor what we need) are the two options.
I'll take another look at this next week
OK I removed any of the subproject usage for now and went back to the bootstrap approach. This seems to work well when building in the repository, but the sdist installs are broken.
Need to look through that a bit more but am open to ideas. The main limitation is that Meson does not allow you to edit/create files in-source, so the vendoring of files naturally does not happen until build time. But that is probably too late for the sdist
For building the sdist can you run the bootstrap manually before running python -m build --sdist
?
Also make sure that you edit python-wheels.yaml
appropriately so that we can make sure the wheels build/test correctly!
I think the problem is that meson does not allow you to generate files in the source tree at all, so when doing a commit-level integration you need the bootstrap script to write the files to the build directory, but when generating an sdist the build directory doesn't yet exist
Yes, but we can always modify the build tree before building the sdist?
- name: Build
run: |
cd python
// stuff required to build sdist
python -m build --sdist
Hmm do you mean the build tree or the source tree? The build tree won't exist until after the sdist is created, and you can't be sure what the build directory will actually be named since it can be controlled by the user
Sorry, the source tree!
(We're the only ones that ever build an sdist so we can copy whatever we want into the python/
directory before it gets packaged and ensure meson.build looks for it?)
we can copy whatever we want into the
python/
directory before it gets packaged and ensure meson.build looks for it?
I don't think there is a way to do this. Meson is very strict about not letting you generate files in the source tree, so this either violates the requirements for commit-level integration, or I think we'd have to deploy some hacks to make it work in both settings.
So I may be leaning back towards the subproject approach that vendors the subproject when the sdist is created
I am not sure I understand...the copying I'm talking about for the sdist happens before anything Python-related is invoked?
I think your proposal would work if you only ever installed from an sdist, but wouldn't work when building from this repo. Let me see if I can clarify.
Assume we had a structure like this:
src/
nanoarrow.c
python/
src/
nanoarrow_c.pxd
The PR as it currently stands will copy the src/nanoarrow.c
file into the build folder and reference it from there:
src/
nanoarrow.c
python/
src/
nanoarrow_c.pxd
builddir/ # generated at build time
nanoarrow.c
What you are proposing is to generate an sdist before installing the package with a layout that may look something like:
src/
nanoarrow.c
python/
src/
nanoarrow_c.pxd
vendor/
nanoarrow.c
However, I don't know of a simple way for Meson/ninja to resolve to a source file potentially in the source tree (when installing from an sdist) while falling back to looking for it in the build folder (when building in this repo)
I am still not sure I understand but trust you! Feel free to disable the sdist generation for now (although we will need it before the next release).
It also may be that we need to do something here that meson-python is not built to do (which is fine...our current build system is very low maintenance!).
@rgommers maybe you have an idea on the Windows failure? Tried looking through upstream discussions but its a rather broad topic:
https://github.com/apache/arrow-nanoarrow/actions/runs/11559159131/job/32173258300?pr=644#step:6:184
Need python for x86_64, but found x86
AFAIU the container which cibuildwheel installs a 32 bit Python but the compiler being used is 64 bit?
AFAIU the container which cibuildwheel installs a 32 bit Python but the compiler being used is 64 bit?
Yes indeed. The tl;dr is that if you want to build wheels for 32-bit Python on 64-bit Windows, Meson doesn't automatically configure MSVC in 32-bit mode for you (rationale: there could be non-Python library or executable targets, and why would those be 32-bit because somewhere else in meson.build
a 32-bit Python is detected).
To fix it, explicitly configure 32-bit MSVC in your cibuildwheel
-using job:
https://github.com/PyWavelets/pywt/blob/361cc9a4c80bc2603fae99e151f2a391c0b376f9/.github/workflows/wheel_tests_and_release.yml#L204-L214
The only issue I see is that flatcc sources aren't vendored/not contained in the sdist.
The latest commit should add that functionality now, relying on Meson's own functionality to bundle subprojects as part of dist creation. It's a bit wonky that flatcc is bundled that way and the rest of the files are created by our custom script reading from a CMake file...but I'm not sure that can be resolved in this PR. Definitely would like to tackle that in follow up(s)
Thanks @paleolimbot for the review and @rgommers for all of the guidance!
This PR originally started as a nanobind POC to see if we could convert from Cython over to that, but I scaled that back just to focus on the possiblity of porting the build system to meson (you can see some nanobind components in the history, if you cared to look)
The build system port may have some merits on its own. It is more tightly integrated with the build system of the C code, and has the advantage of being built in parallel. ~Technically it links to the main project by symlinking it as a subproject in the
subprojects
directory; if we cared to support source Python builds we could replace that symlink with a wrap file on distribution, or could even detect within the configuration if the symlink resolves. In the case it does not, we can fallback to a github release~If the user wanted to develop locally, they could simply symlink the project root into the subprojects folder of Python (the git repo already does this). For source distributions, the wrap system can provide a fallback project to download and reference
If we choose to try and move off of Cython incrementally and use something like nanobind or even C extensions, I think Meson can handle those more capably than the setup.py script