csukuangfj / kaldifeat

Kaldi-compatible online & offline feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd - Provide C++ & Python API
https://csukuangfj.github.io/kaldifeat
Other
187 stars 35 forks source link

Whisper fbank features vs Kaldi fbank features – Clarification Needed #111

Open sangeet2020 opened 3 days ago

sangeet2020 commented 3 days ago

Hi, I would like to understand the differences between Whisper's fbank features and Kaldi's fbank features? I get that conceptually both features are derived from Mel-filterbank energies, but then what makes them different?

thank you.

csukuangfj commented 3 days ago

There are many many differences.

For instance, the function for converting hz to mel is different.

Please have a look at the code to find them out by yourself.

sangeet2020 commented 3 days ago

I went through it, and indeed there were several minute differences:

would there be more?

csukuangfj commented 3 days ago

Is there pre-emphasis?

How is the filter bank matrix computed?

sangeet2020 commented 3 days ago

oh yeah, pre-emphasis, missed writing that.

How is the filter bank matrix computed?

kaldi uses dot product between spectrum magnitude and mel_banks, while whisper uses matrix multiplication between stft magnitudes and mel_filters.

csukuangfj commented 3 days ago

I am afraid you don't know the details of mel.filter bank matrix.

csukuangfj commented 3 days ago

or you don't know how the matrix is computed.

sangeet2020 commented 3 days ago

Whisper uses librosa.filters.mel() to generate mel filter bank matrix, while kaldifeat fbank computer mel filter banks by creating triangular filters and applying them to the FFT bins. I am not sure if there is any more to it...perhabs, I need to closely study the code