Quansight-Labs / quansight-labs-site

💻 Development site and blog for Quansight Labs
https://labs.quansight.org
23 stars 44 forks source link

Moving SciPy to the Meson build system | Quansight Labs #273

Closed utterances-bot closed 2 years ago

utterances-bot commented 2 years ago

Moving SciPy to the Meson build system | Quansight Labs

https://labs.quansight.org/blog/2021/07/moving-scipy-to-meson/

henryiii commented 2 years ago

I would like to point out there's a proposal (looking for interested projects) to move scikit-build to a PEP 517 base and I think fix the issues you had when comparing build systems. https://iscinumpy.gitlab.io/post/scikit-build-proposal/ I wrote the original version of this proposal before the search, so it's actually not based on the list of things you found lacking in scikit-build, but I think it matches quite well. The benefit of CMake is that there already is a massive ecosystem of C++ libraries (around 60% of all projects, but that's from a biased source), I think even more heavily in the sciences, that support CMake, and there's fantastic support for and from all IDEs, compilers, etc. Many scientific projects already support CMake (like PyTorch) or use libraries that only have CMake support. SciPy can build a brand new configuration for everything, but it would be nice for other libraries to reuse existing work.

I've also never understood why meson has a custom DSL; the main issue with CMake is that you have to learn another language, but that's true with meson too. "Modern" CMake is a pleasure to use too, see https://cliutils.gitlab.io/modern-cmake/ - I think CMake's biggest issue is there's so many bad examples and legacy code around.

I'm happy to see more choice for Python build systems (always good, the less distutils/setuptools is used for binaries, the better), and hopefully the scikit-build proposal will be funded and we can have some friendly competition. ;) And if you know any CMake heavy project interested in being involved, please let them know I'm looking for letters of collaboration!

rgommers commented 2 years ago

Hi @henryiii, thanks for sharing! I like your proposal, and I like scikit-build. There's a lot to unpack in your proposal - I'll reply to a few points you made here, I'd also be happy to jump on a call sometime soon and see if we can help each other in any way.

move scikit-build to a PEP 517 base

That sounds good; building those hooks is not hard so I guess the big chunk of work here is "get rid of distutils/setuptools completely inside scikit-build". I would like to point out that the pyproject.toml (PEP 517) hooks are mainly for building sdists and wheels (and maybe simple editable installs), but they are not the right way to provide a developer interface. This seems to become a persistent misconception among Python packaging folks (not saying that's you). A good build/dev tool needs a good CLI.

so it's actually not based on the list of things you found lacking in scikit-build, but I think it matches quite well.

After evaluating the landscape again, scikit-build was the only other tool I considered more seriously. I think even in its current state, scikit-build is a significant improvement over setuptools. My choice had little to do with scikit-build itself (aside from its setuptools dependency, which is fixable), I really do quite like how it looks. My choice fundamentally is because Meson is preferable over CMake to me. The assessment of one SciPy dev very familiar with CMake was "it's ugly but it certainly works" - and I think that is fair. Given this is a once in a decade effort to move build systems, I don't really want to settle for ugly-but-works. The two key points regarding CMake though are:

Many scientific projects already support CMake (like PyTorch) or use libraries that only have CMake support.

Two thoughts: (a) Meson integrates with and can reuse CMake build config in other libraries if needed, and (b) CMake use in PyTorch is not something I've seen anyone enthusiastic about (and I've worked on PyTorch for a couple of years).

SciPy can build a brand new configuration for everything, but it would be nice for other libraries to reuse existing work.

The things that need most work, like a pyproject.toml interface and Fortran on Windows, are new for both Meson and CMake. Whatever else Meson was missing I was happy to invest time in, and the Meson devs have been helpful.

I've also never understood why meson has a custom DSL

The main reason I think is that one of its most fundamental design choices is that it's not a Turing-complete language. Which, after our distutils experiences, seems like a great idea. If you watch Jussi Pakkanen's talks on Youtube, he makes a solid argument about this.

"Modern" CMake is a pleasure to use too, see https://cliutils.gitlab.io/modern-cmake/ - I think CMake's biggest issue is there's so many bad examples and legacy code around.

Yep, agreed. Throwing away the old docs and starting from scratch, pulling in existing content like your book (which looks great), would be one of the most impactful things the CMake devs could do.

There is still a concern about the modern language, which does look nice and in a way is similar to Meson, still including all the old constructs. Basically, string programming with globals. In a project where many people maintain the build config over many years, that is more likely to lead to hacks and weird bugs.

I'm happy to see more choice for Python build systems (always good, the less distutils/setuptools is used for binaries, the better),

Agreed:) I think having two capable build systems to choose from is great (plus Flit as the best choice for pure Python projects).

and hopefully the scikit-build proposal will be funded and we can have some friendly competition. ;)

Sounds good. I'd be happy to discuss, and perhaps share ideas for obtaining funding. So far this Meson effort has also been unfunded, aside from the recent decision of the SciPy project to spend $12,000 to finish the integration.

And if you know any CMake heavy project interested in being involved, please let them know I'm looking for letters of collaboration!

I believe RAPIDS/cuDF is moving to scikit-build. That's the only large project I know of right now.

henryiii commented 2 years ago

I'd also be happy to jump on a call sometime soon and see if we can help each other in any way.

That would probably be a good idea. I've been out for a week, but am starting to get caught up.

but they are not the right way to provide a developer interface

Yes, I'm aware that we'll need something. How are you handling this with Meson? Is it possible to just use the meson interface for development?

Throwing away the old docs and starting from scratch

The reference manual is one of the best, if you know what you are looking for, it's fantastic (especially post 3.20). It's just the tutorial and examples that are lacking. A lot. Part of the problem is setting a minimum version - libraries want to set really low version numbers. If you want a 1:1 comparison with Meson, you should require the same ages of the libraries, rather than comparing a 2021 meson with a 2015 (3.4) or 2017 (3.10) version of CMake. So writing examples becomes challenging. A "common needs" section like Meson has would probably go very far.

huge C++ code base rather than a sane Python one

This is an advantage for CMake, I think. Asking someone who is writing a compiled language to understand the packaging system for another language is not feasible; rake is much better than make, but the average user doesn't want to deal with a Ruby stack and bundles just to get a task runner. This was related to the downfall of SCons, too. I think Meson has done the best than can, with zero dependencies and a zipapp, but it's still not a single binary like CMake. It's not relevant at all for making Python packages, but for general C++, you want a compiled tool, not a Python stack. The fact there isn't a Python interface for Meson, and so therefore users are not tempted to require modules, probably helps. It seems to be doing particularly well replacing autotools, which is great, any reduction of usage of autotools is a step forward.

Bazel does not use Python, it's just Python-like. It implements a stripped-down, modified version of Python; it doesn't require a Python stack. (I think, it's a bit hard to tell; from source it requires a C++ compiler, JDK, and Python, but I think the runtime doesn't require anything).

Personally, if I were starting a new build system from scratch, I'd be tempted to use Lua - it's at least a real language and it can be embedded in a compiled application easily (such as lualatex). Though Ruby would be really elegant for a eDSL (see homebrew's recipes, rake, etc).

PS: Much of CMake is written in CMake; the C++ core is pretty reasonable and has been updated to at least C++11.

FYI, from very initial look, it seems Rust is actually in good shape here. They have a PEP 517 builder and a setuptools extension that looks much more elegant than scikit-build.

rgommers commented 2 years ago

How are you handling this with Meson? Is it possible to just use the meson interface for development?

Yes, it's quite nice, at least for everything build-related. For running tests & benchmarks it's not going to be a full replacement for pytest or asv of course. The recommended setup will be to develop in a specific [virtual/conda]env (let's name it scipy-dev for SciPy), and then just install it into that. And use the meson setup, configure, install, introspect etc. commands for building.

Note that SciPy also has an "all in one" wrapper to have a unified interface to the many tools that especially new contributors need, that's useful to keep: https://github.com/scipy/scipy/blob/master/runtests.py

Asking someone who is writing a compiled language to understand the packaging system for another language is not feasible

That's a small thing I'd think, most users will be able to install with a package manager (apt, brew, etc.) if they're totally unfamiliar with Python and don't want to use pip or similar. Or otherwise use the zipapp. I agree a single "just works" binary is nicer especially on Windows though.

This was related to the downfall of SCons

Interesting. I thought the problem was just that SCons was super slow and hence not competitive. It's typically ~6x slower than either CMake or Meson IIRC. Design was also suboptimal. We had working NumPy and SciPy builds based on SCons around 2010, and @cournape decided to abandon that system because he didn't like it (Waf seemed like a better basis at the time).

The fact there isn't a Python interface for Meson, and so therefore users are not tempted to require modules, probably helps.

Python is an implementation detail. Meson could be rewritten in Rust, or in whatever if ever needed (but it's not needed for performance) - there is nothing user-facing other than that the syntax is Python-like.

Lua would be nice too indeed.

FYI, from very initial look, it seems Rust is actually in good shape here

Yes, I've heard very good things about PyO3 and Maturin.

I'd also be happy to jump on a call sometime soon and see if we can help each other in any way.

That would probably be a good idea. I've been out for a week, but am starting to get caught up.

Cool, let me send you a message:)

eli-schwartz commented 2 years ago

Python is an implementation detail. Meson could be rewritten in Rust, or in whatever if ever needed (but it's not needed for performance) - there is nothing user-facing other than that the syntax is Python-like.

Indeed. This is actually happening in practice, too. Not necessarily for performance, but because python can be a bit of a heavyweight build dependency depending on your needs (and this is especially relevant for bootstrapping an OS and building the low-level parts of the base sysroot).

https://mesonbuild.com/FAQ.html#but-i-really-want-a-version-of-meson-that-doesnt-use-python

The muon reimplementation in c99 has done great work in getting a fairly comprehensive build+test+install tool for C/C++/custom targets, and is a viable alternative to meson.py when building a decent assortment of complicated projects.

So, it can definitely be done. :)

It does NOT currently implement most of the meson modules so there is a fairly large amount of work left to do.