craffel / mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.
MIT License
603 stars 112 forks source link

Replace scipy.fftpack with scipy.fft #373

Open bmcfee opened 6 months ago

bmcfee commented 6 months ago

This came up in #370, but the source separation metrics are presently implemented using the old / legacy scipy.fftpack API. As the modernization PR bumps the minimum scipy version requirement to 1.4, we can now update this to use the newer scipy.fft API. See https://docs.scipy.org/doc/scipy/release/1.4.0-notes.html#scipy-fft-added

The main benefits of doing so are that we can get some speed and memory improvements by using rffts instead of ffts (big benefit for long signals!). We can also squeak out a bit more performance by using the next_fast_len function instead of assuming 2**k.

Probably we can also vectorize some of the transform implementations.

bmcfee commented 6 months ago

Just a quick benchmark on my laptop using our fixtures, TLDR is we can cut FFT costs about in half, just by using rfft and a smarter length calculation:

In [32]: ref_sources = __load_and_stack_wavs('data/separation/ref06/')

In [33]: ref_sources.shape
Out[33]: (2, 16000)

In [35]: n = ref_sources.shape[-1]

In [36]: n_fft = int(2**np.ceil(np.log2(n + flen - 1)))

In [37]: n_fft2 = scipy.fft.next_fast_len(n + flen - 1)

In [38]: n_fft
Out[38]: 32768

In [39]: n_fft2
Out[39]: 16632

In [40]: %timeit scipy.fftpack.fft(ref_sources, n=n_fft, axis=1)
382 µs ± 3.72 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [41]: %timeit scipy.fft.fft(ref_sources, n=n_fft, axis=1)
383 µs ± 3.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [42]: %timeit scipy.fft.rfft(ref_sources, n=n_fft, axis=1)
285 µs ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [43]: %timeit scipy.fft.rfft(ref_sources, n=n_fft2, axis=1)
196 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

This would allow us to simplify some downstream code (ie remove some np.real casts) and should not cost us anything in precision.