ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
https://ziqiaopeng.github.io/synctalk/
Other
1.05k stars 115 forks source link

Hubert Inference #129

Open schxnhxlz opened 1 month ago

schxnhxlz commented 1 month ago

Hi there,

I trained a Video with hubert. Everything looks good so far. But when I try to inference with audio (converted to .npy with python data_utils/hubert.py --wav data/<name>.wav # save to data/<name>_hu.npy it creates me a 1 minute video without audio and wrong lip movements. anything I missed here?

cheers

antipon commented 1 month ago

I did the same thing with deepspeech and I got the same result as you.

schxnhxlz commented 1 month ago

I did the same thing with deepspeech and I got the same result as you.

did you try it with deepspeech as well? Had the same issue there :/

zhouzhenneng commented 1 month ago

Did you process the data with hubert before that, jus like: CUDA_VISIBLE_DEVICES=3 python data_utils/process.py data/.mp4 --asr hubert

schxnhxlz commented 1 month ago

Yes. Still the same issue. Im now trying to train longer.

schxnhxlz commented 1 month ago

I did the same thing with deepspeech and I got the same result as you.

Check the sampling rate of your audio file. mine was 48000. i converted it to 16000 and it worked.