Martin-Seysen / mmgroup

Python implementation of the monster group
38 stars 4 forks source link

Build the mmgroup package with Meson #10

Open Martin-Seysen opened 1 year ago

Martin-Seysen commented 1 year ago

The current build system for mmgroup relies on some patches of the distutils/setuptools package. distutils has been marked as deprecated since Python 3.10, and will be dropped in Python 3.12. Although setuptools is still available in Python 3.12, and the current build system has no explicit dependencies on distutils, it may become difficult to maintain these patches in the near future.

So the author plans to switch to the Meson build system. Scipy has gone that way, and Numpy is just going that way right now. Since mmgroup heavily depends on numpy, it appears to be safe to go that way with mmgroup too.

We have to consider some peculiarities, mainly concerning the automatic generation of C code with Python. Here the main difficulty is that we require some of the generated C code implementing low-level functions for computing tables to be used by high-level C functions. Actually, this process is repeated twice.

In the Meson build system there is a strict separation between the source directories (which are never changed by Meson) and the build directories (which are created by Meson from scratch). See

https://mesonbuild.com/Overview.html#terminology

This approach is quite different from the current approach using something similar to

python setup.py build_ext --inplace

Meson also distinguishes between host targets and build targets, see

https://mesonbuild.com/Cross-compilation.html#mixing-host-and-build-targets

For the mmgroup projects this means that we have to build a minimal host target that is capable of generating all the required C files. This means some duplication of work when building just one target. So during the porting phase we will minimize the complexity of the host target without changing the public interface of the project. A present our github action creates 12 or more different build targets, so that the extra cost for building an additional minimal host target becomes negligible.

There are also some advantages of this approach:

Migration path

It appears to be impossible (at least for the author) to maintain the current live project and an experimental project using Meson in the same github repo.

Therefore we will delete the current github miniproject in

https://github.com/Martin-Seysen/mmgroup_miniproject

and start building a minimal host system and some small target system with Meson, including all github actions using cibuildwheel and triggering the generation of the documentation in readthedocs.

Any help as well as suggestions for the migration are welcome.

Martin-Seysen commented 1 year ago

As a first step for switching to Meson we will write subprocesses (taking unix-style arguments and options) for all activities dealing with input or output paths. So Meson will be able to take complete control over all such paths by using these subprocesses. These changes will be programmed and tested in the current build environment, still using the setuptools package.

MattMcL4475 commented 1 year ago

Martin, thank you for writing this exciting project, I'm excited to learn more about the implementation. Happy to help with this build pipeline - if you can break this into a few different Github issues, I can try implementing one of them (I don't want to duplicate your efforts). Thank you!

MattMcL4475 commented 1 year ago

As a start, here's ChatGPT's attempt at breaking up the above:

Martin-Seysen commented 1 year ago

Hello Matt

Thank you for your offer to help me with porting the project to Meson. I appreciate any help from a developer who is more familiar with Meson than I am. Before we can create a Meson project we'll have to change the code generation tools to satisfy the following requirements.

In the last few weeks I have changed my code generation tools to satisfy these requirements. But there are still a few simplifications possible which I plan to do next week. After that, essentially one code generation tool 'generate_code.py' will remain. Eventually, each invocation of this tool will create a set of .c files, a single .h file, and, possibly, a .pxd, a .pxi, and a .pyx file for integration into Cython.

Then the simplified tool 'generate_code.py' should work in the old 'setup.py' build environment as well as in a new build environment based on Meson. Also I plan to document the new version of the code generation tool, as well as the dependencies between source files and libraries, when finished.

If you plan to (mainly) do the port to Meson, dealing with the simplified tool 'generate_code.py' may be a good point to start with.

Apart from this, most of the build steps for the mmgroup project can be done in fairly standard way by people familiar with C, shared libraries, Cython, setuptools.py or Meson, cibuildwheel, and github actions. The one big non-standard exception is as follows:

There are now three shared libraries (or DLLs in Windows) used by the Cython extensions, namely mmgroup_mat24.so, mmgroup_mm_op.so, and mmgroup_mm_reduce.so. Generating C files used for making the second shared library requires Cython extensions depending on the first shared library. This dependency exists, because we have to calculate rather large tables for building the second library. Doing this with pure python would be a bit slow. The C files used for the third library depend on the functions in the second library for the same reason.

For dealing with this special requirement I have extended setuptools; but I think this is a rather weird hack that should be done in a cleaner way with Meson.

Regarding issues

In not sure whether it will help to have more issues than people working on the project. Here are my comments on ChatGPT's issues:

This is what the current issue is about

The current comment is a little plan for switching to Meson. Just feel free to ask questions or add comments.

Essentially, I have assigned this issue to myself. The result will be an improved tool 'generate_code.py'.

The final tool 'generate_code.py' should be able to handle this.

The old 'setup.py' build environment does not support such a separation. The same sources and libraries are built again and again for each different python wheel. In principle this can be optimized. I'm not sure whether Meson strictly requires separation between host and build targets. Since the project is now already quite large, I prefer a 'small changes only' policy between functional versions of the project. So this step could perhaps be delayed to a later phase in the project. But, may be, a more experienced Meson user has a different point of view here.

See my comment on last issue. Also, for a minimal host system (which will be dropped at the end) static linking may be easier than dynamic linking.

The final tools, especially 'generate_code.py', should be able to handle this.

This is our ultimate goal.

Martin-Seysen commented 1 year ago

Simplification of the build process has now been finished. I don't plan to make any further changes on the 'generate_code.py' tool in the near future. I'll focus on the documentation.

eli-schwartz commented 1 year ago

The old 'setup.py' build environment does not support such a separation. The same sources and libraries are built again and again for each different python wheel. In principle this can be optimized. I'm not sure whether Meson strictly requires separation between host and build targets.

Meson has the concept of host (cross) and build (native) targets, but the current design assumes that the use of native targets is to produce code generator tools, rather than to install those native targets. It is not possible to produce wheels for multiple architectures in a single meson invocation, since all the install bits are based on host (cross) targets.

Hopefully this isn't a problem. At least, using ccache may allow you to more or less instantaneously rebuild native code generator programs as the compiler input/outputs should be identical across different cross builds.

Martin-Seysen commented 1 year ago

Thank you for pointing my attention towards ccache for reducing the compiling overhead. At the moment I'm stuck at a more basic step. I want to build a native Cython extension in a known directory with Meson. I need that Cython extension in a further code generation step. Any hints how to proceed are welcome.

eli-schwartz commented 1 year ago

Here's an example meson.build snippet that builds two cython extensions, one of which is a native one:

py_mod = import('python')

if get_option('hostpy') == ''
    py = py_mod.find_installation()
else
    py = py_mod.find_installation(get_option('hostpy'))
endif

if get_option('buildpy') == ''
    buildpy = py
else
    buildpy = py_mod.find_installation(get_option('buildpy'))
endif

hello2 = buildpy.extension_module('hello2', 'hello2.pyx', native: true)

custom_target(output: 'foo.txt', command: [buildpy, '-c', 'import hello2'], depends: hello2, capture: true)
py.extension_module('hello', 'hello.pyx', install: true)

Pass the -Dbuildpy=... and -Dhostpy=... options to control which python executables you use for each. By default when unspecified it uses the native one.

There is some awkwardness here because meson's ideal cross compilation setup involves using https://mesonbuild.com/Machine-files.html to specify binaries, but the find_installation() method doesn't, itself, respect the native: kwarg[^1] so a cross file would end up overriding both the build and the host pythons. Instead, my example implements defining the pythons to use via project options.

This could be fixed for future versions of meson, though.

[^1]: No one ever coded the capability into meson, probably because no one ever thought of the use case of cross compiling a python extension to use as a build tool. It's not something someone would usually write a C/C++ program to do, but cython is a pretty good fit here since you can write your logic in python but run it at something closer to C speeds. Cython support in meson is, however, much newer than general python extension support.

Martin-Seysen commented 1 year ago

The latest commit builds mmgroup also for Python 3.12 successfully in the old build environment. So there is no immediate pressure to switch to Meson in the next few weeks or months. For new developers in the project it is probably easier to deal with Meson than to deal with a proprietary build system. Meson requires strict separation between build and host environment. Code generation takes place in the build environment; and high-level code generation requires access to Python extensions using code generated at a lower level. Using static linking in the host environment is an unpleasant waste of memory. In the build environment saving memory is much less an issue; and dynamic linking requires dealing with LD_LIBRARY_PATH, DLL hell, or similar nuisance. So we need both, static and dynamic linking; and switching between these two things is easy in Meson. The old build environment now also supports both, static and dynamic liking. This simplifies the next step to be done, which is proper separation between build and host environment.