TimoBolkart / voca

This codebase demonstrates how to synthesize realistic 3D character animations given an arbitrary speech signal and a static character mesh.
https://voca.is.tue.mpg.de/en
1.15k stars 273 forks source link

How to get the deep speech windows #52

Closed meherabhi closed 4 years ago

meherabhi commented 4 years ago

I am trying to implement your work. Could you help me with the procedure for getting the the feature vectors from the deep speech model. Because from what i know the deep speech model's output is a text transcript.

TimoBolkart commented 4 years ago

You are right, the final layer of DeepSpeech is fed to a softmax function whose output is a probability distribution over characters which then returns characters from a dictionary. We directly use the output of the final FC layer of DeepSpeech as feature vector. Long story short, this repository contains the entire training code including the audio encoding. For details please look into the utils/audio_handler.py to see how the speech signal is fed to DeepSpeech and how the output is being used.

meherabhi commented 4 years ago

Thank you... that cleared my doubt.