Closed johndpope closed 7 months ago
Thanks for your attention.
In fact, we use the stable video diffusion(SVD) as baseline. Its fine-grained structure and powerful pre-training can be treated as base model + reference-net. We only added controlnet to control the posture as poseguider. In theory, the input of controlnet can be replaced by audio to achieve the EMO effect. However, we do not plan to open source the training code, but you can refer to what I write in README to train your own EMO.
this is another paper implementation with training code (n.b. MooreThreads is a competitor to Nvidia) https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/train_stage_1.py
I pulled apart their architecture and it's very complex. I like how this code is very succinct.
I'm wanting to implement the EMO - emote through portrait paper using the celebhq-v dataset. https://github.com/johndpope/emote-hack
My progress is slow. I think you have reduced a lot of complexity by using diffusers - though it could be extended to use poseguider and other models to help stear the output. Please consider releasing training code and or name a price to opensource it.