explosion / cython-blis

💥 Fast matrix-multiplication as a self-contained Python library – no system dependencies!
Other
219 stars 37 forks source link

Update `setup.py` to build object files in parallel if requested #105

Open althonos opened 5 months ago

althonos commented 5 months ago

Hi again!

Since the best way to install cython-blis is to compile it from source to take advantage of the machine architecture. In the case of our HPC cluster, I end up re-installing cython-blis on each node executor at the start of each job to make sure I'm using optimized code, but this takes a bit of time.

Given that BLIS has a lot of source files, the build process can be parallelized easily. I just changed the logic of the ExtensionBuilder.compile_objects code to actually invoke the compiler to build objects in parallel with a ThreadPool, based on the parallel flag of the command line (which is a default build_ext option), or using the MAX_JOBS environment variable (similar to what torch and flash-attn are doing).

By default, I left the job count to 1, so that parallel compilation happens only if enabled. Using 4 threads, the compilation is about twice faster:

MAX_JOBS="4" pip install blis --no-binary=blis