python speech features fbanks

janjanusek commented 7 months ago

Hello, how can I simulate fbanks feature from https://github.com/jameslyons/python_speech_features? I checked out wiki but could not find fbanks calculation although something tells me that this library is more than capable of computing this features.

Thank you

ar1st0crat commented 7 months ago

Hi, here's what you're looking for: https://github.com/ar1st0crat/NWaves/wiki/MFCC-and-Mel-Spectrogram#nwaves-and-python_speech_features . Also, there's a detailed video (link at the top of the page)

FilterBank = PsfFilterbank(samplingRate, melCount, fftSize).

// ...

/// <summary>
/// Generates filterbank with weights identical to python_speech_features.
/// </summary>
float[][] PsfFilterbank(int samplingRate, int filterbankSize, int fftSize, double lowFreq = 0, double highFreq = 0)
{
    var filterbank = new float[filterbankSize][];

    if (highFreq <= lowFreq)
    {
        highFreq = samplingRate / 2;
    }

    var low = NWaves.Utils.Scale.HerzToMel(lowFreq);
    var high = NWaves.Utils.Scale.HerzToMel(highFreq);

    var res = (fftSize + 1) / (float)samplingRate;

    var bins = Enumerable
                  .Range(0, filterbankSize + 2)
                  .Select(i => (float)Math.Floor(res * NWaves.Utils.Scale.MelToHerz(low + i * (high - low) / (filterbankSize + 1))))
                  .ToArray();

    for (var i = 0; i < filterbankSize; i++)
    {
        filterbank[i] = new float[fftSize / 2 + 1];

        for (var j = (int)bins[i]; j < (int)bins[i + 1]; j++)
        {
            filterbank[i][j] = (j - bins[i]) / (bins[i + 1] - bins[i]);
        }
        for (var j = (int)bins[i + 1]; j < (int)bins[i + 2]; j++)
        {
            filterbank[i][j] = (bins[i + 2] - j) / (bins[i + 2] - bins[i + 1]);
        }
    }

    return filterbank;
}

ar1st0crat commented 7 months ago

UPD. and fbank is basically a filterbank extractor.

def fbank(signal,samplerate=16000,winlen=0.025,winstep=0.01,
      nfilt=26,nfft=512,lowfreq=0,highfreq=None,preemph=0.97)

is equivalent to:

var fbankExtractor = new FilterbankExtractor(
   new FilterbankOptions
   {
       SamplingRate = 16000,
       FrameDuration = 0.025,
       FftSize = 512,
       HopDuration = 0.01,
       Window = WindowType.Hann,
       PreEmphasis=0.97,
       FilterBank = PsfFilterbank(16000, 26, 512, 0)
   });

janjanusek commented 7 months ago

Thank you very much, I really appreciate your effort 👍 I'll test it later today

janjanusek commented 7 months ago

works! I don't have same features comparing to python code they're slightly different which is totally normal due some differences, but over all it works just like I needed. Thanks!

ar1st0crat / NWaves

python speech features fbanks #86