apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
149 stars 34 forks source link

feat: Standalone Python installer using Meson #505

Closed WillAyd closed 2 weeks ago

WillAyd commented 1 month ago

This can be a follow up once #483 and #484 are completed

paleolimbot commented 1 month ago

For what it's worth, I'd like to move the "bundle" logic from CMake to Python at some point (at which point bootstrap.py work work without CMake or Meson)...is the issue that CMake is a requirement for a development install? (I have noticed that the CMake requirement does make it slightly harder to install from git/zip).

WillAyd commented 1 month ago

In this PR the bootstrap file actually gets decoupled from CMake and has nothing to do with Meson - the only thing Meson does is provide it a command line argument so it knows where to place the file it generates.

Generally though, yes right now the Python installation is in a bit of a weird place. It still uses setuptools, which is the historical builder / packager / installer for Python but has fallen out of favor for more specialized tools, especially in the scientific Python stack. meson-python is what NumPy, pandas and SciPy use so it has some good momentum in the space. If you wanted to stick with CMake + Python you could replace setuptools with scikit-build-core

Generally though I find the meson-python piece to be great. It works for development and can even work as a standalone installer if you wanted to pip install where a wheel is not available. I think trying to get all of that to work with CMake + scikit-build-core would be a much larger effort

paleolimbot commented 4 weeks ago

setuptools, which is the historical builder / packager / installer for Python but has fallen out of favor for more specialized tools

This is good to know (my Python is not very idiomatic and was mostly learned 8ish years ago). I will point out that because setuptools has been supported since the dawn of time on every platform where there is a Python, it makes it trivial to support installation everywhere (which is sort of nanoarrow's mantra). Setuptools makes it difficult to incorporate non-system dependencies (which is where meson and cmake shine for building Python extensions), but we don't have any dependencies we can't vendor (and never will). I suppose the gist of this PR is that it removes the vendoring of nanoarrow and turns it into a dependency; however, I'm not sure we want that either (also sort of nice to be able to distribute a self-contained source distribution that can be installed offline).

The one thing I don't like about the bootstrap + vendor method is that in a debugger the source references are all to the bundled nanoarrow.c, so it's a tiny bit of extra work to find + fix the error in the actual C sources (luckily that doesn't happen very often 🙂 ). It seems like the example from the bottom of the wrap file accomplishes that.

WillAyd commented 4 weeks ago

Ah ok cool. Makes sense with the goals you have in mind to stick with setuptools then - we can review this later if that becomes problematic

paleolimbot commented 4 weeks ago

Perhaps this could also live as an example in examples/? This is definitely a far superior approach for another project to build a C/C++ extension that depends on nanoarrow (setuptools is awful about mixing C and C++, in particular).

Also, I trust you more than I trust me with respect to Python packaging and am happy to merge anything that passes CI (which has, or should have, complete coverage of supported platforms).

WillAyd commented 4 weeks ago

Yea this can't pass CI until #483 and #484 happen, so could also defer reviewing until that point in time

On a second read just want to point out that:

I suppose the gist of this PR is that it removes the vendoring of nanoarrow and turns it into a dependency;

Doesn't really change with this. Python doesn't have robust dependency management like that. This just makes it easier to build from source as meson would handle downloading the library for you, but when a wheel gets created it would still copy the necessary nanoarrow library from the source structure (i.e. subprojects for Meson) and into the wheel

I actually think this still might check all the boxes but lets defer until we can get CI green to review again