ZitengWang / python_kaldi_features

python codes to extract MFCC and FBANK speech features for Kaldi
MIT License
63 stars 18 forks source link

Different Features Except dither=0.0 #2

Closed zlinsmile closed 5 years ago

zlinsmile commented 5 years ago

Hi, Thanks for your sharing, It helps me so much

I got almost exactly same results by use logfbank(sig2,nfilt=23,lowfreq=20,dither=0,wintype='povey') with compute-fbank-feats --dither=0.0 in Kaldi But when I change something like dither, nfilt, I got different features when I do in kaldi, The closest I can get is with dither=0.0

So I think maybe something was different in DCT or lifter ? Then I checked dct and lifter, For dct, you use the DCT(II), Kaldi use DCT(II) too. For lifter, the equation of lifter in your is same as Kaldi. But the decimal part is still different

Could you please tell what makes the difference or I do something wrong? Is it an error in calculation? Thank you : )

Kaldi use DCT(II) : The following is ComputeDctMatrix from Kaldi's ComputeDctMatrix which I think is different from dct of yours: `template void ComputeDctMatrix(Matrix *M) { //KALDI_ASSERT(M->NumRows() == M->NumCols()); MatrixIndexT K = M->NumRows(); MatrixIndexT N = M->NumCols();

KALDI_ASSERT(K > 0); KALDI_ASSERT(N > 0); Real normalizer = std::sqrt(1.0 / static_cast(N)); // normalizer for // X_0. for (MatrixIndexT j = 0; j < N; j++) (M)(0, j) = normalizer; normalizer = std::sqrt(2.0 / static_cast(N)); // normalizer for other // elements. for (MatrixIndexT k = 1; k < K; k++) for (MatrixIndexT n = 0; n < N; n++) (M)(k, n) = normalizer

zlinsmile commented 5 years ago

oh, I just realize that, maybe because the random in dither... def do_dither(signal, dither_value=1.0): signal += numpy.random.normal(size=signal.shape) * dither_value return signal And we have different random number, so we get something different?

ZitengWang commented 5 years ago

oh, I just realize that, maybe because the random in dither... def do_dither(signal, dither_value=1.0): signal += numpy.random.normal(size=signal.shape) * dither_value return signal And we have different random number, so we get something different?

Yes, the closest we can get is with dither=0.0

zlinsmile commented 5 years ago

oh, I just realize that, maybe because the random in dither... def do_dither(signal, dither_value=1.0): signal += numpy.random.normal(size=signal.shape) * dither_value return signal And we have different random number, so we get something different?

Yes, the closest we can get is with dither=0.0

Thank you^^ So you do the same thing as Kaldi, but why there are still a little different in MFCC(just like -7.41751318 and -7.417507), maybe just because of the calculation error between python and c++?

ZitengWang commented 5 years ago

@zlinsmile I think the results become different because of the FFT implementations in python and C++.

zlinsmile commented 5 years ago

@ZitengWang Thank you^^