question about stage1 - Githubissues

fudan-generative-vision / hallo2

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

https://fudan-generative-vision.github.io/hallo2

MIT License

4.37k stars 623 forks source link

question about stage1 #36

Open wangjue-wzq opened 3 weeks ago

wangjue-wzq commented 3 weeks ago

In the first stage of training, target_img, ref_img are randomly selected, face_emb is a frame from the video, the three may not be the same frame, if the training stability is guaranteed?

cuijh26 commented 3 weeks ago

It could be guaranteed, because face emb only represents the facial feature, and there is only one id in the training videos.

wangjue-wzq commented 3 weeks ago

It could be guaranteed, because face emb only represents the facial feature, and there is only one id in the training videos.

Face_emb is fixed for a video, but the target image has different angles, expressions, and even background changes during training.