Closed Xinghui-Wu closed 3 years ago
This project can support various feature configurations, but the default model uses MFCC. Specifically:
--use-energy=false
--num-mel-bins=40
--num-ceps=40
--low-freq=20
--high-freq=-400
--sample-frequency=16000.0
The papers with regard to DeepSpeech 1 and 2 from Baidu claimed that they used the spectrogram of power normalized audio clips as the features to the system. The implementation of Mozilla DeepSpeech, based on TensorFlow, said that they turned to apply the MFCC features. Now I am curious about that of your implementation for my research purpose on the comparison of various feature extraction algorithms. So, I will appreciate it if you could solve my puzzle. Thank you.