The feature extraction technique (or the input to the neural network model) used in this implementation?

daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

GNU Affero General Public License v3.0

332 stars 49 forks source link

The feature extraction technique (or the input to the neural network model) used in this implementation? #58

Closed Xinghui-Wu closed 3 years ago

Xinghui-Wu commented 3 years ago

The papers with regard to DeepSpeech 1 and 2 from Baidu claimed that they used the spectrogram of power normalized audio clips as the features to the system. The implementation of Mozilla DeepSpeech, based on TensorFlow, said that they turned to apply the MFCC features. Now I am curious about that of your implementation for my research purpose on the comparison of various feature extraction algorithms. So, I will appreciate it if you could solve my puzzle. Thank you.

daanzu commented 3 years ago

This project can support various feature configurations, but the default model uses MFCC. Specifically:

--use-energy=false
--num-mel-bins=40
--num-ceps=40
--low-freq=20
--high-freq=-400
--sample-frequency=16000.0