Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
692 stars 152 forks source link

how to generate voice from video? #20

Closed molo32 closed 3 years ago

molo32 commented 3 years ago

I have an mp4 video of a person speaking with choppy sound. Can you tell me where I put the mp4 and what is the script to generate sound in silent parts. thanks since now

prajwalkr commented 3 years ago

The master branch contains speaker-specific models, which cannot be used for arbitrary speakers. The multi-speaker branch can generate words at max. You can modify the complete_test_generate.py to take an inference on real videos. You might need to process the video data similar to how it is done in https://github.com/joonson/syncnet_python.

DomhnallBoyle commented 3 years ago

Hi, I think I need to call the following function to generate a spectrogram and therefore a wav file for a new video: def synthesize_spectrograms(self, faces, embeddings, return_alignments=False): Is the embeddings param above from the face encoder?

Does the result of the below function give me the face embeddings needed above? def embed_frames_batch(frames_batch):

Thanks