faroit / python_audio_loading_benchmark

Benchmark popular audio i/o packages
138 stars 10 forks source link

Update for librosa 0.7 #7

Open bmcfee opened 5 years ago

bmcfee commented 5 years ago

Leaving a marker here that the benchmarks should rerun on librosa 0.7.0 (and probably include version numbers more generally).

Quick summary of changes:

As an aside, we always had API support for excerpts and seeking. It wasn't terribly efficient because audioread didn't support that universally, but it should be almost no overhead relative to soundfile now. The only additional overhead would be downmixing or resample-on-load, but those shouldn't be included in the benchmarks anyway.

faroit commented 5 years ago

Thanks, I will re-ran the benchmark asap. Can't wait to have a fast audio loading in librosa 👍

As an aside, we always had API support for excerpts and seeking. It wasn't terribly efficient because audioread didn't support that universally, but it should be almost no overhead relative to soundfile now.

I just updated the table, I oversaw the seek support.

bmcfee commented 5 years ago

Great, thanks! As an aside, we also have some helpers for metadata (duration, samplerate). The same conditions apply there -- sndfile by default, backing out to audioread if necessary. So I would expect it to product a somewhat jagged set of plots.

faroit commented 4 years ago

sorry for the delay, I am about to also add tf 2 and tf.io support and will reran the benchmark once for all of these. I the meantime, I added version numbers to the readme table so users can see that these were computed with an old version of librosa

faroit commented 4 years ago

@bmcfee #8 is almost finished. Took quite some time to get things right for tensorflow-io. Anyway, concerning librosa, it looks as you predicted:

any idea why the soundfile backend is even faster than using soundfile directly - aka. is there anything I could optimize?

https://github.com/faroit/python_audio_loading_benchmark/blob/d71fbe6aa661e138d530c7465431badcb286ef2b/loaders.py#L59-L61

bmcfee commented 4 years ago

any idea why the soundfile backend is even faster than using soundfile directly - aka. is there anything I could optimize?

It shouldn't be faster, but it looks like the differences are within the error bars. (Viz suggestion: use dots/swarms instead of bar plots.) Probably this is down to cache effects and general system load fluctuations.

If I understand correctly, it looks like you're implementing your own benchmark code by calling time.time() and loading a bunch of files in sequence: https://github.com/faroit/python_audio_loading_benchmark/blob/d71fbe6aa661e138d530c7465431badcb286ef2b/benchmark_np.py#L85-L91

I guess the point here is to average over a bunch of file loads to get a sense of the average behavior, but I think you could do a little bit better by using the timeit utility separately for each file. If you load each file individually several times, that can neutralize caching and warm-start effects that could linger from a previous run. The statistics get a bit more involved, and it will take longer, but it should cut down on the variance.