Rudrabha / LipGAN

This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".
http://cvit.iiit.ac.in/research/projects/cvit-projects/facetoface-translation
MIT License
578 stars 122 forks source link

melspectrogram tensor dimentions mismatch #29

Closed nikitadurasov closed 4 years ago

nikitadurasov commented 4 years ago

hey, thanks for great paper and repo! I have only one issue with running pythonic implementation: 1) I've loaded random youtube video with talking person in .mp4 format 2) with ffmped I got both .wav and .mp4 files 3) then running something like python batch_inference.py --checkpoint_path checkpoint.h5 --model residual --face youtube_video.mp4 --fps 30 --audio youtube_video.wav --results_dir result_dir I face problem with melspectrogram tensor, since it has shape [80, ...], but input_audio in pretrained model requires [12, ...] (and I have 0 ideas how to solve it)

Wav file was generated with: ffmpeg -i youtube_video.mp4 youtube_video.wav

Could you please provide proper example, how to use batch_inference.py for arbitrary .mp4 video? Or maybe you have any ideas what are reasons for such dimentions sizes mismatch?

Thanks in advance!

prajwalkr commented 4 years ago

You might be in the wrong branch. Run the following and try taking an inference again.

git checkout fully_pythonic

Also, make sure you have downloaded the correct pre-trained model, from the fully_pythonic branch. The shape [12, ...] you are saying is the model in the master branch.

nikitadurasov commented 4 years ago

nice, thanks! Seems issue is solved