How to get the deep speech windows

meherabhi commented 4 years ago

I am trying to implement your work. Could you help me with the procedure for getting the the feature vectors from the deep speech model. Because from what i know the deep speech model's output is a text transcript.

TimoBolkart commented 4 years ago

You are right, the final layer of DeepSpeech is fed to a softmax function whose output is a probability distribution over characters which then returns characters from a dictionary. We directly use the output of the final FC layer of DeepSpeech as feature vector. Long story short, this repository contains the entire training code including the audio encoding. For details please look into the utils/audio_handler.py to see how the speech signal is fed to DeepSpeech and how the output is being used.

meherabhi commented 4 years ago

Thank you... that cleared my doubt.

TimoBolkart / voca

How to get the deep speech windows #52