felixrosberg / FaceDancer

Other
460 stars 76 forks source link

not an issue - question on tiktok filter - photo animation #22

Closed johndpope closed 1 year ago

johndpope commented 1 year ago

https://www.tiktok.com/@rtmikesbich/video/6981298757153443078?embed_source=121331973%2C120811592%2C120810756%3Bnull%3Bembed_blank&refer=embed&referer_url=www.popbuzz.com%2F&referer_video_id=6981298757153443078

Tiktok has this filter that applies animation to a static photo - it's called "photo animation". if you bring up a selection of photos - they all come to life and it does it in real time - there's a pretrained sequence of smile - and eye winking - but it works on any photo. is there a paper that comes to mind - or code that you've seen that does this? facedancer needs a photo and training - and then 10 seconds later on 3090 card - I get the results of warp video. this is operating real time - no training.... was thinking talking head + motion capture reply perhaps as similiar....

felixrosberg commented 1 year ago

I am not to familiar with how these filters work, but some kind of attribute-controllable GAN makes sense. One example is StyleMC (https://github.com/catlab-team/stylemc) that find attribute directions in the latent space. This could for example then be 'open eyes' -> 'closed eyes' together with a GAN projector and blending, and you should be able to edit images (and animate image by interpolating in the 'open eyes' -> 'closed eyes' direction for example). I would not be surprised if TiKTok uses a generator with built-in encoder that preserves spatial information (such as pose) better than projecting into GAN latent space (maybe something similar to StarGAN-v2). These models are probably also significantly distilled/compressed and optimized to run real-time on a phone.

Other than that perhaps TikTok just use some kind of rendering together with face detector, as said I have no idea.

If they use a GAN then they also need to do training and most talking head papers I have seen is some kind of GAN.

FaceDancer just need 2 images when it is done training, it is also rather fast already. I would not be surprised if it can achieve real time on CPU if distilled/quantized + optimization properly.

johndpope commented 1 year ago

Thanks 🙏