Fictionarry / TalkingGaussian

[ECCV'24] TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
https://fictionarry.github.io/TalkingGaussian/
247 stars 33 forks source link

Training Results Not Matching Demo Quality – Possible Overclaim? #38

Open caliber1313 opened 1 month ago

caliber1313 commented 1 month ago

Hello, unfortunately, my results are nowhere close to the demo clip shown https://www.youtube.com/watch?v=c5VG7HkDs8I.

and Could you provide clarification on:

  1. The exact hyperparameters like seeds ? and dataset you used to train the model in the demo?
  2. Any additional specific pre-processing steps or adjustments that might help improve the quality?
  3. Whether the demo clip was further enhanced or fine-tuned in ways not covered in the training script?

here's my result (DeepSpeech), Is there anything I did wrong? I'm sure that I followed all the instructions : https://drive.google.com/file/d/1MC9O9c5Rtk5_GyTKUL0ak6wHt6qVZbb-/view

Fictionarry commented 1 month ago

https://github.com/user-attachments/assets/dd0a93f4-9da3-4f2b-9633-82252fdfe6b9

https://github.com/user-attachments/assets/ca620e19-41ad-46b8-96c8-af469f8eb1c6

Hi, there are two models I just trained completely using the code in this repo with deepspeech, both of which are more reasonable than the results you provided. So I consider it must have some problems with your reproduction process. Please have a double-check.

Experiments in the paper are based on the code in this repo. All processes and hyperparameters are given. The only few adjustments can be seen in the submission history. They are to enhance the robustness of a wider range of data, which would not lower the performance.

jarun-title commented 1 month ago

@Fictionarry Can you provide trained .pth files and all preprocessed data I can use to reproduce above result? I adjust the environment to work with cuda 12.1 and I'm not sure if it's the cause of expected result.

Fictionarry commented 1 month ago

@Fictionarry Can you provide trained .pth files and all preprocessed data I can use to reproduce above result? I adjust the environment to work with cuda 12.1 and I'm not sure if it's the cause of expected result.

Here are the checkpoint and the estimated camera poses for May. Because the entire preprocessed data is a bit large, I'm afraid it's troublesome to upload it. You can first try if the performance can be well reproduced with the provided checkpoint, to find where the problem is located, whether the data preprocessing or the training stage.

I have tried the code with CUDA 11.7. In that situation, there seems no problem if pytorch is installed with the correct version (1.13.1 cuda 11.7 I used).

https://drive.google.com/drive/folders/14oKQz113I0jCGfbyq0SIJ4eVRddoO02D?usp=drive_link

jarun-title commented 1 month ago

Thank! I'll try

sstzal commented 4 days ago

@caliber1313 @jarun-title
Hi, I have encountered the same problem, that is, the mouth of the generated face almost does not move. Have you solved the problem? Could you provide some experiences on this?

Thank you very much!