Rudrabha / LipGAN

This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".
http://cvit.iiit.ac.in/research/projects/cvit-projects/facetoface-translation
MIT License
578 stars 122 forks source link

Generated Video Rambles & Stops (Bad Lip Sync) #9

Closed ExponentialML closed 4 years ago

ExponentialML commented 4 years ago

Thanks for developing this software as it works very well! There's an issue where if you have an audio clip, the generated video continues to "ramble" through them. Things I've tried:

It's very easy to reproduce. You can record audio and try it. The model will ramble right through it, and there will be an elongated pause at the end of the video with the person freezing. If I were to guess, maybe something with the mfcc (I'm unfamiliar with this as a whole) isn't working properly, or there needs to be an implemented interpolation method to sync video with audio. Currently using the pretrained models at your Google Drive link. Any insight is appreciated!

prajwalkr commented 4 years ago

Please provide the audio and video file where this happens. Did you try on static faces first with the same audio? What happens then?

ak9250 commented 4 years ago

@prajwalkr any plans to provide a google colab for this, that would solve most issues that are related to os?

prajwalkr commented 4 years ago

@prajwalkr any plans to provide a google colab for this, that would solve most issues that are related to os?

We have not planned this at the moment. The reason is primarily due to the MATLAB dependency.

ExponentialML commented 4 years ago

Please provide the audio and video file where this happens. Did you try on static faces first with the same audio? What happens then?

Attached the file below. If you notice in the attached file, it stops halfway through with subpar lip sync. The same error still occurs when using different types of faces, audio, or any mix between the two. Also, I have to use a workaround by rendering a video of the static face. If I use any type of image(jpg, png, bmp) with any type of face(the most perfect frontal face you can think of), it will not detect it. Only mp4 files will work.

1 Number of frames in the input video: 1 Number of frames to be used for inference: 0 Length of mfcc chunks: 1804 nan%| | 0/8 [00:00<?, ?it/s] nan [00:00, ?it/s] Traceback (most recent call last): File "batch_inference.py", line 230, in <module> main() File "batch_inference.py", line 203, in main total=int(np.ceil(float(len(mfcc_chunks))/batch_size)))): File "/home/user/.local/lib/python3.6/site-packages/tqdm/_tqdm.py", line 897, in __iter__ for obj in iterable: File "batch_inference.py", line 99, in datagen idx = 0 if args.static else i%len(frames) ZeroDivisionError: integer division or modulo by zero

result_voice.mp4.zip

prajwalkr commented 4 years ago

Please also provide the exact command you executed and the complete output.

And try with some face that was used in the demo video. Like this one: https://www.biography.com/.image/t_share/MTE5NTU2MzE2Mjk4OTcwNjM1/paul-mccartney-9390850-1-402.jpg

Also, it should work for single images. So try the above in single image mode and not as mp4.

Finally, the output video you have mentioned above has decent lip sync until it freezes. The random slight jittery lip movement in silences is a known issue, in the proposed model itself. It is a direction for improvement in future work.

I will, however, assist you in the freezing problem after you give me more details as stated above.

ExponentialML commented 4 years ago

I've decided to close this issue even though it's not really solved. I feel that improving this project would fix any issues that come up, so there's no need in trying to troubleshoot any further.