f90 / Wave-U-Net

Implementation of the Wave-U-Net for audio source separation
MIT License
824 stars 177 forks source link

Faster resampling #25

Closed matangover closed 5 years ago

matangover commented 5 years ago

Preprocessing during training with the latest code in master is extremely slow (several minutes per song, sometimes >10 minutes). Profiling showed that almost all of the time is spent in scipy.signal.resample, which uses the FFT and is therefore very slow for a large/prime number of samples. I suggest to let librosa do the resampling using resampy's fast algorithm.

The new code improves resampling time on one song I tested it from CCMixter from 15 minutes to 36 seconds on my machine.

Evaluate.py still calls the slow resample, this probably should be fixed too by either reverting Utils.resample to use scipy.signal.resampe_poly or using librosa.core.resample instead.

f90 commented 5 years ago

Thanks for your contribution. Before I merge this, it would be good to have the upsampling in the evaluation function at

https://github.com/f90/Wave-U-Net/blob/fe50c52a31b3231a1777f14eb6131a819f082fc8/Evaluate.py#L64

use the same resampling procedure as in the preprocessing, for consistency. The problem there is that e.g. when using 44.1KHz input and a 8192Hz model, when the audio downsampled and then the predicted audio upsampled using librosa, the amount of samples is different from the number of samples in the original input mixture. So I used this scipy-based resampling since it also has a function to give a desired output length of the resampled signal.

Do you have a proposal for how to do this with the librosa resampling? Maybe take the number of input samples N, downsample, get model outputs of length M, and then perform a resample with M and N as original and new "sampling rate" to ensure the output length is exact?

matangover commented 5 years ago

I understand, let me think of a solution. With polyphase filtering (used before the latest revamp) you didn't have this problem?

f90 commented 5 years ago

In the current implementation I use the scipy-based resampling since I can specify an exact output length. I played around with librosas implementation that you propose using, and it seems that for fractional resampling ratios, downsampling the signal and then upsampling it back always results in an equally long or longer signal compared to the input. It appears I can thus simply cut the last samples from the output signals to make the length fit with the original input mixture.

I will merge this now and then commit a change to the evaluation code so it also uses librosas resampling procedure - so that the resampling during preprocessing/training is the same as the one used in the end when doing prediction.

Thanks for your help!

matangover commented 5 years ago

That's great, thank you!!