Significant performance differences with different numpy installs

Expected behaviour

Please describe how you expected madmom to behave.

I would expect madmom to achieve roughly similar performance with OpenBLAS numpy and MKL-enabled numpy (for example, the numpy installation docs say "MKL is typically a little faster and more robust than OpenBLAS").

Actual behaviour

There is a 30x slowdown for ParallelProcessors when using OpenBLAS numpy (the default on pypi!), compared to MKL numpy (the version on conda), on my 2021 MacBook M1 Max. Processing a ~3min 44.1khz WAV file with an RNNBeatProcessor takes ~300sec with the pip version on numpy installed, while it takes 10sec with the conda version of numpy.

This means that even the same code, executed in an environment constructed from the same requirements.txt file, will produce very different performance results if inside a conda environment vs. in a Python virtualenv (or any other non-conda environment manager) -- even on the same hardware.

Note: I understand that this isn't an issue with madmom (a wonderful piece of software that I am grateful for your efforts developing and sharing!!!). However, it would probably be extremely useful to many users to add a big loud warning to the installation info that this performance gap exists for other environments besides my own (if it does!) -- this is something users can set up themselves with proper knowledge, but if it isn't flagged, it could result in significant and unnecessary performance degradations.

Steps needed to reproduce the behaviour

import madmom
import soundfile as sf
beat_proc = RNNBeatProcessor()
samples, sr = sf.read(path_to_some_file)
beat_acts = beat_proc(samples)

Information about installed software

Python 3.10 madmom 0.17.dev0 numpy 1.24.4

CPJKU / madmom