YuanxunLu / LiveSpeechPortraits

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)
MIT License
1.16k stars 198 forks source link

Two questions about `smooth_loss` in `audio2headpose_model` #54

Closed DreamtaleCore closed 2 years ago

DreamtaleCore commented 2 years ago

Hi, I'm trying to repeat the training part of audio2headpose these days. I have two questions about the implementation.

  1. Is mu_gen=Sample_GMM ... (Line-103) in audio2headpose_model benefit to the performance? Besides, I have found 'We also tried with a Gaussian Mixture Model but found no obvious improvement' in the paper, but I am a little confused. Are those the same thing? It seems the implementation of Eq(8) is the Sample_GMM function (please correct me if I am wrong).
  2. The computational efficiency of Sample_GMM is rather low. When using it (set smooth_loss > 0), it needs ~2h for one epoch. I find that there are too many for-loops (line-99) and CPU operation. Are there other alternatives?
YuanxunLu commented 2 years ago

Training the audio2headpose module using smooth loss is one of my history experiments. Actually, I didn't use this smooth loss finally, where I only use the probabilistic loss as demonstrated in the paper. I think this smooth loss doesn't work obviously as I remember, so it is deprecated.

Sorry that I didn't clean up the training-related codes clearly and they confuse your training.

Gaussian Mixture Model is just a multi-gaussian version loss, and I just use one gaussian so it degrades to a single Gaussian distribution. I describe it in the code comments on in GMM loss function.

I didn't know any alternative to this loss and this is my implementation. You can check it on the internet and write your own version to speed it up.

DreamtaleCore commented 2 years ago

Thanks!