facebookresearch / AudioMAE

This repo hosts the code and models of "Masked Autoencoders that Listen".
Other
503 stars 43 forks source link

kaldi fbank #25

Open lix4 opened 8 months ago

lix4 commented 8 months ago

Hi there,

I am wondering what does fbank really give us in the dataloader? I went to torchaudio doc and did not find much info about what it is. Does anyone have a link to its explanation?

Thank you,

XTxiatong commented 1 month ago

I do have the same question. In the paper, it claims that 'we transform audio recordings into Mel spectrograms and divide them into non-overlapped regular grid patches', but it seems the codebase used fbank instead of spectrogram, any reason?