Martin-Seysen / mmgroup

Python implementation of the monster group
38 stars 4 forks source link

Use CPU vector extensions for accelerating mmgroup #11

Open Martin-Seysen opened 6 months ago

Martin-Seysen commented 6 months ago

We want to use the vector extensions of a CPU (e.g. AVX2 for x86-64 or NEON for ARM) for accelerating the mmgroup package.

The reason for an early announcement of this change is that this will affect the order of the basis vectors in the 196884-dimensional representations of the Monster. Although this is unlikely, we cannot exclude the possibility that some users rely on the current order of these basis vectors without noticing it.

Previously, we have never formally asserted any such order; but we also have not explicitly warned the user about relying on such an order. Now we have added the following warning to the documentation:

The 196884-dimensional representations of the Monster are considered as auxiliary tools for computing in the Monster. We do not guarantee the interoperability of vectors in these representations between different versions of the mmgroup package. The internal representation of such vectors is likely to change in the near future. A user with a serious need for such interoperability can consult the author for advice.

Technical information about this change

As a first step, the acceleration will be shipped as a source distribution. Users may compile that source distribution on their local machines with the gcc compiler option '-march=native' to obtain versions optimized for their machines. The final result of that compilation will be a python wheel that may be installed and imported as an add on to the mmgroup package.

Other compilers than gcc will not be supported in the foreseeable future, since they lack a function like __builtin_shuffle for performing arbitrary permutations on arrays of, say, 16 bytes length. Here the maximum length of an array to be permuted efficiently depends on the type of the CPU; but a length of 16 is possible on many CPUs. The operation of the Monster on its 196884-dimensional representation requires a large number of permutations of length up to 512. Here a suitable reordering of the entries of a vector in that representation may save a lot of overhead.