fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
https://fudan-generative-vision.github.io/hallo/
MIT License
9.25k stars 1.27k forks source link

Question about reduction=mean when computing loss in train stage 2 ? #164

Open progrobe opened 2 months ago

progrobe commented 2 months ago

Hi, authors. Thanks for your great work. I have a question about loss computing here https://github.com/fudan-generative-vision/hallo/blob/83dd4ebf52baa27de737045773d4fc4163d7c820/scripts/train_stage2.py#L857 Why do we set reduction=mean when computing loss in train stage 2 ? It seems reductionshould be set to noneinstead of mean following the setting of stage 1. Setting it to meanmakes mse_loss_weights meaningless and total loss will be multiplied by the value of sum(mse_loss_weights)/train_batch_size which is a number less than or equal to 1.

If the followings are correct, this probably results in unstable training process and no convergence.