ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
https://ziqiaopeng.github.io/synctalk/
Other
1.07k stars 119 forks source link

Unstable training using private data #92

Closed UdonDa closed 1 month ago

UdonDa commented 2 months ago

Hi,

I have a problem with unstable training using my private data. According to a plot of training losses, the training seems wrong. I would like to find out what caused that problem. Does anyone successfully train your private data?

# Pre-processing for data
python data_utils/process.py "data/tanaka_moe/tanaka_moe"

# Train
export CUDA_VISIBLE_DEVICES=0
dataset="tanaka_moe"
asr_model="ave"
workspace="results/${dataset}_${asr_model}"
python main.py data/$dataset --workspace $workspace -O --iters 60000 --asr_model $asr_model --preload 1

Epoch 4

ngp_ep0004_0075_rgb

Epoch 16

ngp_ep0016_0004_rgb

Screenshot 2024-05-06 21 21 23

The training video is below, https://github.com/ZiqiaoPeng/SyncTalk/assets/25411643/c2e10fa5-78be-44b0-a42d-bf697829062d

Akatukiaoki commented 2 months ago

tmpBA4 这怎么解决啊.每次一个epoch跑完就报No faces were detected.的错误

UdonDa commented 2 months ago

@Akatukiaoki Can you use English? I'm not Chinese. Are you repeating the same question over and over in my issue? I think that your error is happening because you failed to generate the correct faces at the evaluation stage, probably because the training has a problem as in my case.

Akatukiaoki commented 2 months ago

@UdonDa This problem seems very common. I found a solution on other forums and it can now run. May I ask what software or code you used during the training?Your chart looks cool. My English is very poor

UdonDa commented 2 months ago

@Akatukiaoki

I found a solution on other forums and it can now run.

Where? Could you share it with me?

This chart is plotted by Tensorboard. You can see it in run/ngp/events.out.tfevents..... To visualize it, you run tensorboard --logdir .

Akatukiaoki commented 2 months ago

@UdonDa I only discovered the problems I encountered myself, and I cannot understand your problem. I don't know why this situation occurred. I am just a beginner in this field

UdonDa commented 2 months ago

@Akatukiaoki Can you successfully train a model with your private data? If you don't mind, could you please share the video file you are using? There might be something wrong with my video.

Akatukiaoki commented 2 months ago

@UdonDa You can try using May to retrain and give it a try.I have encountered a problem. There is no ngp.pth file in my folder, and I am wondering how to obtain it

UdonDa commented 2 months ago

I re-preprocess May and re-train SyncTalk. As in discussed other issues (#77), the pre-processing using mediapipe causes the problem of quality degradation, according to my training log. PSNR should be over 37, but this log has 32.

Anyways, I cannot understand why the training with my private video failed while the training with May was successful.

Screenshot 2024-05-07 12 12 18

Tensorboard plot

Screenshot 2024-05-07 12 11 41
UdonDa commented 2 months ago

@Akatukiaoki You should see <result_dir>/checkpoints. It is just a weight file.

Akatukiaoki commented 2 months ago

This is my folder, he doesn't have ngp.pth tmp3DB

UdonDa commented 2 months ago

Please try to rename

G-force78 commented 2 months ago

Bad results here too, what is best practice for data prep?

https://github.com/ZiqiaoPeng/SyncTalk/assets/114336644/bc84ce97-ceac-4e6d-97a6-c9bb793ba02b

UdonDa commented 2 months ago

I changed to use hubert and ER-NeRF's pre-processing, i.e. OpenFace. Training works well, but, lip sync is not enough.

https://github.com/ZiqiaoPeng/SyncTalk/assets/25411643/1eeaf494-caf7-4c8d-8c89-ea83bbe03fd0

Akatukiaoki commented 2 months ago

@UdonDa Can increasing the number of training sessions increase the stability of the mouth? Yesterday, I trained on the Obama video, 60000 times for the head and 90000 times for the mouth, but the mouth is still blurry and has a slight flicker

HenryKang1 commented 2 months ago

For the best practice, good preprocessing. There can be many failure cases. ( I saw some artifact because video failed to make accurate mask) Also good syc of video and audio as original. Fianlly Good blending algorithm Those three are key point for good video generation generally.

UdonDa commented 1 month ago

@HenryKang1 Thanks advicing! I agree with you, BEST PRE-PROCESSING IS ALL YOU NEED.

good preprocessing

How do you pre-process your private video to crop facial region? I follow the pre-processing of CelebV-HQ's one, which is as same as FirstOrderModel's one. I set increase=0.4.

ZiqiaoPeng commented 1 month ago

tmpBA4 这怎么解决啊.每次一个epoch跑完就报No faces were detected.的错误

It has been fixed.

HenryKang1 commented 1 month ago

@HenryKang1 Thanks advicing! I agree with you, BEST PRE-PROCESSING IS ALL YOU NEED.

good preprocessing

How do you pre-process your private video to crop facial region? I follow the pre-processing of CelebV-HQ's one, which is as same as FirstOrderModel's one. I set increase=0.4.

I do the similar things. I have my own custom module with padding and resizing function. Try to crop the face with entire face, neck and hair or at least include the neck. Also, training short video does not give good result. try to train longer than 1 min video. Finally apply post processing to remove the artifact.

ZiqiaoPeng commented 1 month ago

You can try the latest code.

Zhiyuan624 commented 1 month ago

@Akatukiaoki 您好,我也碰到了No faces were detected.的错误,请问您是如何解决的?十分感谢!

shounakthevoicing commented 5 days ago

https://github.com/ZiqiaoPeng/SyncTalk/assets/173535969/1ef93280-8775-4073-894c-eea3a537af20

@UdonDa I am getting bad results after training for 200k steps total, can you give a link or guide me on how you used ER-NeRF's pre-processing, i.e. OpenFace?