Closed meherabhi closed 4 years ago
You are right, the final layer of DeepSpeech is fed to a softmax function whose output is a probability distribution over characters which then returns characters from a dictionary. We directly use the output of the final FC layer of DeepSpeech as feature vector. Long story short, this repository contains the entire training code including the audio encoding. For details please look into the utils/audio_handler.py to see how the speech signal is fed to DeepSpeech and how the output is being used.
Thank you... that cleared my doubt.
I am trying to implement your work. Could you help me with the procedure for getting the the feature vectors from the deep speech model. Because from what i know the deep speech model's output is a text transcript.