Open weizmann opened 1 year ago
I change the logfbank with adding winstep=0.02 (default is 0.01, just tried to hardcode here), this could make the audio frame to 250
But the output is still not correct, neither. (lip sync is not correct and with frame lags)
https://user-images.githubusercontent.com/2306111/234510806-dabae7ed-e9b5-4d85-94e9-d16f813595ca.mp4
Hi, the problem is that the desired sampling rate of a audio file is 16khz, but that of your audio file is 44.1khz. I recommend you downsample your audio file to 16khz. Besides, as videos of LRS2 are 25 fps, so I set fps of output is 25. I will modify it to automatically compatible with fps of input videos
Sorry, my code can only work with videos with 25 fps, as audio encoder will output audio embedding of 25 fps.
Video is of 25 fps and audio is of 16khz but still the output video frames are more than input.
Video is of 25 fps and audio is of 16khz but still the output video frames are more than input.
Could you provide details of the problem you face? Such as the input video and the output video
you could refer to #12
I change the logfbank with adding winstep=0.02 (default is 0.01, just tried to hardcode here), this could make the audio frame to 250 ↳
But the output is still not correct, neither. (lip sync is not correct and with frame lags)↳
talklip.mp4
frame lag is a bug too. The padding step is unreasonable and needs revising. I will update in later.
I have tried with
inf_demo.py
, but I found that the frame count of the output video was doubled.The input video file is 10s/25fps/250frames, but I found the duration of the output video file is 20s/25fps/501frames.
I find the length of audio features array is 501.
Maybe the audio/video frames are not aligned in my case. I am not sure if there are some fps/sample rate constraint in your project.
Waiting for your reply, thank you.
You can find my input/output video/audio files in the following linkage.
talklip-issue.zip
I run the inf_demo.py with the following command:
python inf_demo.py --video_path ./input.mp4 --wav_path ./input.wav --ckpt_path ./checkpoints/global_contrastive.pth --avhubert_root /root/workspace/av_hubert
ffmpeg version is 4.2.3:
some debug logs: