Open sangeet2020 opened 3 days ago
There are many many differences.
For instance, the function for converting hz to mel is different.
Please have a look at the code to find them out by yourself.
I went through it, and indeed there were several minute differences:
would there be more?
Is there pre-emphasis?
How is the filter bank matrix computed?
oh yeah, pre-emphasis, missed writing that.
How is the filter bank matrix computed?
kaldi uses dot product between spectrum magnitude and mel_banks, while whisper uses matrix multiplication between stft magnitudes and mel_filters.
I am afraid you don't know the details of mel.filter bank matrix.
or you don't know how the matrix is computed.
Whisper uses librosa.filters.mel() to generate mel filter bank matrix, while kaldifeat fbank computer mel filter banks by creating triangular filters and applying them to the FFT bins. I am not sure if there is any more to it...perhabs, I need to closely study the code
Hi, I would like to understand the differences between Whisper's fbank features and Kaldi's fbank features? I get that conceptually both features are derived from Mel-filterbank energies, but then what makes them different?
thank you.