Open lix4 opened 1 year ago
I do have the same question. In the paper, it claims that 'we transform audio recordings into Mel spectrograms and divide them into non-overlapped regular grid patches', but it seems the codebase used fbank instead of spectrogram, any reason?
Hi there,
I am wondering what does fbank really give us in the dataloader? I went to torchaudio doc and did not find much info about what it is. Does anyone have a link to its explanation?
Thank you,