ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
https://ziqiaopeng.github.io/synctalk/
Other
1.34k stars 161 forks source link

The results inferred using the model file with 'may' do not contain any faces in the video. #203

Open yyjjww opened 3 months ago

yyjjww commented 3 months ago

https://github.com/user-attachments/assets/67064749-10c7-4ec2-867e-41016dd6dfd8

https://github.com/user-attachments/assets/f25dfcaa-505d-4327-8777-c5f1f4aea8f2

The difference in usage above is whether or not the portrait command is used

@ZiqiaoPeng help me,please!

AyushUnleashed commented 3 months ago

I also had the same issue -> Problem in my case was that I was not doing all the steps & input video was not 25 fps, ( you can use online converter for that )

There are 3 steps.

  1. Processing of the video
  2. Training the model
  3. Final inference.

I was confusing processing & training & that's why I was having this problem.

eg code from my case, I created folder name 'head_sara' video: head_sara.mp4 -> 25 fps , 24 sec video, 1080x1080 video of face I used. ( I think with 512 x 512 I may get better result )

if you won't use 25fps video, you'll get error in training step

python data_utils/process.py data/head_sara/head_sara.mp4 --asr ave

Process the video

python data_utils/process.py data/head_sara/head_sara.mp4 --asr ave

Train the model from the processed video

python main.py data/head_sara --workspace model/head_sara -O --iters 60000 --asr_model ave
python main.py data/head_sara --workspace model/head_sara -O --iters 100000 --finetune_lips --patch_size 64 --asr_model ave

inference with audio

python main.py data/head_sara --workspace model/head_sara -O --test --test_train --asr_model ave --portrait --aud data/head_sara/head_sara.wav