Closed molo32 closed 4 years ago
The master branch contains speaker-specific models, which cannot be used for arbitrary speakers. The multi-speaker branch can generate words at max. You can modify the complete_test_generate.py
to take an inference on real videos. You might need to process the video data similar to how it is done in https://github.com/joonson/syncnet_python.
Hi, I think I need to call the following function to generate a spectrogram and therefore a wav file for a new video:
def synthesize_spectrograms(self, faces, embeddings, return_alignments=False):
Is the embeddings param above from the face encoder?
Does the result of the below function give me the face embeddings needed above?
def embed_frames_batch(frames_batch):
Thanks
I have an mp4 video of a person speaking with choppy sound. Can you tell me where I put the mp4 and what is the script to generate sound in silent parts. thanks since now