CPJKU / madmom

Python audio and music signal processing library
https://madmom.readthedocs.io
Other
1.35k stars 206 forks source link

Significant performance differences with different numpy installs #524

Open jpgard opened 1 year ago

jpgard commented 1 year ago

Expected behaviour

Please describe how you expected madmom to behave.

I would expect madmom to achieve roughly similar performance with OpenBLAS numpy and MKL-enabled numpy (for example, the numpy installation docs say "MKL is typically a little faster and more robust than OpenBLAS").

Actual behaviour

There is a 30x slowdown for ParallelProcessors when using OpenBLAS numpy (the default on pypi!), compared to MKL numpy (the version on conda), on my 2021 MacBook M1 Max. Processing a ~3min 44.1khz WAV file with an RNNBeatProcessor takes ~300sec with the pip version on numpy installed, while it takes 10sec with the conda version of numpy.

This means that even the same code, executed in an environment constructed from the same requirements.txt file, will produce very different performance results if inside a conda environment vs. in a Python virtualenv (or any other non-conda environment manager) -- even on the same hardware.

Note: I understand that this isn't an issue with madmom (a wonderful piece of software that I am grateful for your efforts developing and sharing!!!). However, it would probably be extremely useful to many users to add a big loud warning to the installation info that this performance gap exists for other environments besides my own (if it does!) -- this is something users can set up themselves with proper knowledge, but if it isn't flagged, it could result in significant and unnecessary performance degradations.

Steps needed to reproduce the behaviour

import madmom
import soundfile as sf
beat_proc = RNNBeatProcessor()
samples, sr = sf.read(path_to_some_file)
beat_acts = beat_proc(samples)

Information about installed software

Python 3.10 madmom 0.17.dev0 numpy 1.24.4

jpgard commented 1 year ago

After some more digging -- it actually looks like the issue is due to x86 vs. ARM versions of numpy. Specifically, I observed this in the context of docker environments, where depending on the base image/os, different versions of numpy (with the different instruction sets enabled) were installed by pip.

Again, not a bug per se, but probably something worth highlighting (even if the message is "accelerated linear algebra libraries will accelerate your madmom code").