The papers with regard to DeepSpeech 1 and 2 from Baidu claimed that they used the spectrogram of power normalized audio clips as the features to the system.
The implementation of Mozilla DeepSpeech, based on TensorFlow, said that they turned to apply the MFCC features.
Now I am curious about that of your implementation for my research purpose on the comparison of various feature extraction algorithms. So, I will appreciate it if you could solve my puzzle. Thank you.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
The papers with regard to DeepSpeech 1 and 2 from Baidu claimed that they used the spectrogram of power normalized audio clips as the features to the system. The implementation of Mozilla DeepSpeech, based on TensorFlow, said that they turned to apply the MFCC features. Now I am curious about that of your implementation for my research purpose on the comparison of various feature extraction algorithms. So, I will appreciate it if you could solve my puzzle. Thank you.