Fictionarry / TalkingGaussian

[ECCV'24] TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
https://fictionarry.github.io/TalkingGaussian/
272 stars 34 forks source link

Training Results Not Matching Demo Quality – Possible Overclaim? #38

Open caliber1313 opened 2 months ago

caliber1313 commented 2 months ago

Hello, unfortunately, my results are nowhere close to the demo clip shown https://www.youtube.com/watch?v=c5VG7HkDs8I.

and Could you provide clarification on:

  1. The exact hyperparameters like seeds ? and dataset you used to train the model in the demo?
  2. Any additional specific pre-processing steps or adjustments that might help improve the quality?
  3. Whether the demo clip was further enhanced or fine-tuned in ways not covered in the training script?

here's my result (DeepSpeech), Is there anything I did wrong? I'm sure that I followed all the instructions : https://drive.google.com/file/d/1MC9O9c5Rtk5_GyTKUL0ak6wHt6qVZbb-/view

Fictionarry commented 2 months ago

https://github.com/user-attachments/assets/dd0a93f4-9da3-4f2b-9633-82252fdfe6b9

https://github.com/user-attachments/assets/ca620e19-41ad-46b8-96c8-af469f8eb1c6

Hi, there are two models I just trained completely using the code in this repo with deepspeech, both of which are more reasonable than the results you provided. So I consider it must have some problems with your reproduction process. Please have a double-check.

Experiments in the paper are based on the code in this repo. All processes and hyperparameters are given. The only few adjustments can be seen in the submission history. They are to enhance the robustness of a wider range of data, which would not lower the performance.

jarun-title commented 2 months ago

@Fictionarry Can you provide trained .pth files and all preprocessed data I can use to reproduce above result? I adjust the environment to work with cuda 12.1 and I'm not sure if it's the cause of expected result.

Fictionarry commented 2 months ago

@Fictionarry Can you provide trained .pth files and all preprocessed data I can use to reproduce above result? I adjust the environment to work with cuda 12.1 and I'm not sure if it's the cause of expected result.

Here are the checkpoint and the estimated camera poses for May. Because the entire preprocessed data is a bit large, I'm afraid it's troublesome to upload it. You can first try if the performance can be well reproduced with the provided checkpoint, to find where the problem is located, whether the data preprocessing or the training stage.

I have tried the code with CUDA 11.7. In that situation, there seems no problem if pytorch is installed with the correct version (1.13.1 cuda 11.7 I used).

https://drive.google.com/drive/folders/14oKQz113I0jCGfbyq0SIJ4eVRddoO02D?usp=drive_link

jarun-title commented 2 months ago

Thank! I'll try

sstzal commented 1 month ago

@caliber1313 @jarun-title
Hi, I have encountered the same problem, that is, the mouth of the generated face almost does not move. Have you solved the problem? Could you provide some experiences on this?

Thank you very much!

Fictionarry commented 3 weeks ago

@caliber1313 @jarun-title Hi, I have encountered the same problem, that is, the mouth of the generated face almost does not move. Have you solved the problem? Could you provide some experiences on this?

Thank you very much!

Hi, sorry for replying late. Does the problem still exist? If this can be reproduced on all our provided video samples, I guess it to be an environment problem. Otherwise, it may be caused by the initialization. You can try the code of this version https://github.com/Fictionarry/TalkingGaussian/tree/98aa6f729ec4e4dd0551fa8b389b375cafddd13f and decrease the select_interval in train_face.py if necessary. However, I have not encountered such a problem before and failed to reproduce it on two servers, so I'm not sure whether the tip would work.