Closed Spijkervet closed 3 years ago
Thanks for this. I've run into that problem when trying to train on my custom dataset.
Good catch ! The crepe library unfortunately doesn't provide a way to control the hop length in sample... The problem is that if you select a block size that is not a power of 2, the realtime implementation will break ! I'm working on a solution :)
When preprocessing data for training, the supplied step_size to crepe is 1000 * 512 / 48000 = 10.67, rounded to 11. This misaligns the pitch and loudness vectors and prohibits training (unless I understood the implementation wrong).
And to verify (mostly for myself haha): the original implementation uses a frame rate of 250, so we would need to set the config to a block size of 192, i.e., 1/(192 / 48000) = 250 (step size of 4 to crepe). I'm checking now with the baseline and then see how it performs when increasing the fps for real-time, excitinggg!