asteroid-team / torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
MIT License
924 stars 87 forks source link

Implement resample transform #37

Open iver56 opened 3 years ago

iver56 commented 3 years ago

https://github.com/adefossez/julius

mpariente commented 3 years ago

There's also torchaudio's resample, should we should between both?

iver56 commented 3 years ago

I'm not so fond of torchaudio's resample function, because it seems to be much slower than julius. Here's the result of a crude benchmark that resamples some audio from 44100 hz to 48000 hz on CPU:

librosa/resampy kaiser_fast: 4.23 s
librosa/resampy kaiser_best: 15.12 s
torchaudio kaldi-compliant LPF width=2: 22.97 s
torchaudio kaldi-compliant LPF width=6: 23.56 s
torchaudio kaldi-compliant LPF width=10: 23.99 s
julius cpu 64 zeros: 0.195 s
julius cpu 16 zeros: 0.176 s
mpariente commented 3 years ago

Ok, it's pretty clear that Julius is better, let's stick with it !

mogwai commented 3 years ago

benchmark that resamples some audio from 44100 hz to 48000 hz on CPU:

What other sample rate conversions did you try? Did you compile the Resample transform with torch.jit.script?

iver56 commented 3 years ago

In my crude benchmark, I ran it simply like this:

    import torch
    from torchaudio.compliance.kaldi import resample_waveform
    for lowpass_filter_width in (2, 6, 10):
        with timer("pytorch-audio kaldi-compliant LPF width={}".format(lowpass_filter_width)):
            pytorch_kaldi_compliant = (
                resample_waveform(
                    torch.from_numpy(samples).unsqueeze(0),
                    orig_freq=sample_rate,
                    new_freq=HIGH_SAMPLE_RATE,
                    lowpass_filter_width=lowpass_filter_width,
                )
                .squeeze()
                .numpy()
            )

I didn't try other sample rate conversions

mogwai commented 3 years ago

I've got a notebook to benchmark different methods of resampling. There are some conversions that take longer, I think, due to there being a gcd between input and output sample rates. It would be good to add julius to that list and compare results when resampling is done in batches.

https://gist.github.com/mogwai/a5df03e89ab33bc0a5648965280d5445

In your benchmark you for example load in and out of numpy which can take time.

iver56 commented 3 years ago

Yes, that would be interesting.

Re numpy: Yes, but I did the numpy conversion in the julius benchmark as well. Pytorch tensors share memory with numpy arrays when running on CPU, so the "conversion" should be quite fast.

mogwai commented 3 years ago

I've added julius to the benchmark notebook. Seems that it does produce higher quality and does so faster most of the time. I did notice that it didn't output the same length of samples as was input to it so had to add a minor hack to solve that.

https://gist.github.com/mogwai/a5df03e89ab33bc0a5648965280d5445

iver56 commented 3 years ago

Yes, I've been using fix_length from librosa to solve the length issue. (from librosa.util import fix_length)