bendanzzc / AnimateAnyone-reproduction

reproduction of AnimateAnyone
166 stars 15 forks source link

The model architecture seems to lack poseguider + backbone + referencenet #19

Closed johndpope closed 7 months ago

johndpope commented 7 months ago

this is another paper implementation with training code (n.b. MooreThreads is a competitor to Nvidia) https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/train_stage_1.py

I pulled apart their architecture and it's very complex. I like how this code is very succinct.

I'm wanting to implement the EMO - emote through portrait paper using the celebhq-v dataset. https://github.com/johndpope/emote-hack

My progress is slow. I think you have reduced a lot of complexity by using diffusers - though it could be extended to use poseguider and other models to help stear the output. Please consider releasing training code and or name a price to opensource it.

bendanzzc commented 7 months ago

Thanks for your attention.

In fact, we use the stable video diffusion(SVD) as baseline. Its fine-grained structure and powerful pre-training can be treated as base model + reference-net. We only added controlnet to control the posture as poseguider. In theory, the input of controlnet can be replaced by audio to achieve the EMO effect. However, we do not plan to open source the training code, but you can refer to what I write in README to train your own EMO.