chiachunfu / speech

TensorFlow on mobile with speech-to-text DL models.
163 stars 63 forks source link

MFCC window size #10

Open mpindado opened 3 years ago

mpindado commented 3 years ago

Hi, I just came into this repo because I needed to port an MFCC calculation from librosa to java. I found your class very useful, although I had a minimal problem regarding window size. As my pretrained models did not have default win size (n_fft), I did a minor change to MFCC.java in order to make this work the same as original librosa, producing the same results. I simply want to share this minor tweak if someone needs this in the future:

    // Marcos not default window size
    private final static int       n_win                = 1600;
...
    private double[] getWindow(){
        //Return a Hann window for even n_fft.
        //The Hann window is a taper formed by using a raised cosine or sine-squared
        //with ends that touch zero.
        double[] win = new double[/*n_fft*/ n_win];
        for (int i = 0; i < /*n_fft*/n_win; i++){
            win[i] = 0.5 - 0.5 * Math.cos(2.0*Math.PI*i/(/*n_fft*/n_win));
        }

        // Marcos: Pad center win to n_ftt (see librosa spectrum.py)
        if (n_win < n_fft) {
            double[] padded_win = new double[n_fft];
            int lpad = (n_fft - n_win) / 2;
            int rpad = n_fft - n_win - lpad;
            for (int l=0;l<lpad;l++)
                padded_win[l] = 0.0;
            for (int m=0;m<n_win;m++)
                padded_win[lpad+m] = win[m];
            for (int r=0;r<rpad;r++)
                padded_win[lpad+n_win+r] = 0.0;
            return padded_win;
        }
        else return win;
    }