acids-ircam / ddsp_pytorch

Implementation of Differentiable Digital Signal Processing (DDSP) in Pytorch
Apache License 2.0
451 stars 56 forks source link

config.yaml 48kHz block_size #13

Closed Spijkervet closed 3 years ago

Spijkervet commented 3 years ago

When preprocessing data for training, the supplied step_size to crepe is 1000 * 512 / 48000 = 10.67, rounded to 11. This misaligns the pitch and loudness vectors and prohibits training (unless I understood the implementation wrong).

And to verify (mostly for myself haha): the original implementation uses a frame rate of 250, so we would need to set the config to a block size of 192, i.e., 1/(192 / 48000) = 250 (step size of 4 to crepe). I'm checking now with the baseline and then see how it performs when increasing the fps for real-time, excitinggg!

moiseshorta commented 3 years ago

Thanks for this. I've run into that problem when trying to train on my custom dataset.

caillonantoine commented 3 years ago

Good catch ! The crepe library unfortunately doesn't provide a way to control the hop length in sample... The problem is that if you select a block size that is not a power of 2, the realtime implementation will break ! I'm working on a solution :)