Closed matangover closed 5 years ago
Thanks for your contribution. Before I merge this, it would be good to have the upsampling in the evaluation function at
https://github.com/f90/Wave-U-Net/blob/fe50c52a31b3231a1777f14eb6131a819f082fc8/Evaluate.py#L64
use the same resampling procedure as in the preprocessing, for consistency. The problem there is that e.g. when using 44.1KHz input and a 8192Hz model, when the audio downsampled and then the predicted audio upsampled using librosa, the amount of samples is different from the number of samples in the original input mixture. So I used this scipy-based resampling since it also has a function to give a desired output length of the resampled signal.
Do you have a proposal for how to do this with the librosa resampling? Maybe take the number of input samples N, downsample, get model outputs of length M, and then perform a resample with M and N as original and new "sampling rate" to ensure the output length is exact?
I understand, let me think of a solution. With polyphase filtering (used before the latest revamp) you didn't have this problem?
In the current implementation I use the scipy-based resampling since I can specify an exact output length. I played around with librosas implementation that you propose using, and it seems that for fractional resampling ratios, downsampling the signal and then upsampling it back always results in an equally long or longer signal compared to the input. It appears I can thus simply cut the last samples from the output signals to make the length fit with the original input mixture.
I will merge this now and then commit a change to the evaluation code so it also uses librosas resampling procedure - so that the resampling during preprocessing/training is the same as the one used in the end when doing prediction.
Thanks for your help!
That's great, thank you!!
Preprocessing during training with the latest code in master is extremely slow (several minutes per song, sometimes >10 minutes). Profiling showed that almost all of the time is spent in
scipy.signal.resample
, which uses the FFT and is therefore very slow for a large/prime number of samples. I suggest to let librosa do the resampling usingresampy
's fast algorithm.The new code improves resampling time on one song I tested it from CCMixter from 15 minutes to 36 seconds on my machine.
Evaluate.py still calls the slow resample, this probably should be fixed too by either reverting
Utils.resample
to usescipy.signal.resampe_poly
or usinglibrosa.core.resample
instead.