Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
695 stars 153 forks source link

How Can I Get Generated Text? #4

Closed erkankaracakan closed 4 years ago

erkankaracakan commented 4 years ago

First of all thank you for the project.

As i understand, project doing lip reading and creating text first and then text-to-speech with Tacotron. I'm trying to get generated text from lip reading. Is it possible?

Also do i need text which includes speeches in videos for training my own data?

Thank you.

Rudrabha commented 4 years ago

Our model generates speech (Mel-spectrograms followed by raw speech) from lip movements. Our network uses tacotron 2's decoder and a 3D CNN based encoder which takes video frames as input. The output from the network is a mel-spectrograms which is then converted to raw speech using a vocoder. There is no text involved anywhere in the pipeline.