Example 1 | Example 2 | Example 3 |
---|---|---|
If you use any components of our work, please cite it.
@article{wang2024animatelcm,
title={AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning},
author={Wang, Fu-Yun and Huang, Zhaoyang and Shi, Xiaoyu and Bian, Weikang and Song, Guanglu and Liu, Yu and Li, Hongsheng},
journal={arXiv preprint arXiv:2402.00769},
year={2024}
}
Here is a screen recording of usage. Prompt:"river reflecting mountain"
Animate-LCM is a pioneer work and exploratory on fast animation generation following the consistency models, being able to generate animations in good quality with 4 inference steps.
It relies on the decoupled learning paradigm, firstly learning image generation prior and then learning the temporal generation prior for fast sampling, greatly boosting the training efficiency.
The High-level workflow of AnimateLCM can be
We have launched lots of demo videos generated by Animate-LCM on the Project Page. Generally speaking, AnimateLCM works for fast, text-to-video, control-to-video, image-to-video, video-to-video stylization, and longer video generation.
So far, we have released three models for usage
Animate-LCM-T2V: A spatial LoRA weight and a motion module for personalized video generation. Some trying from the community point out that the motion module is also compatible with many personalized models tuned for LCM, for example Dreamshaper-LCM.
AnimateLCM-SVD-xt. I provide AnimateLCM-SVD-xt and AnimateLCM-SVD-xt 1.1, which are tuned from SVD-xt and SVD-xt 1.1 respectively. They work for high-resolution image animation with 25 frames with 1~8 steps. You can try it with the Hugging Face Demo. Thanks to the Hugging Face team for providing the GPU grants.
AnimateLCM-I2V. A spatial LoRA weight and a motion module with an additional image encoder for personalized image animation. It is our trying to directly train an image animation model for fast sampling without any teacher models. It can generate animations with a personalized image with 2~4 steps. Yet due to the training resources is very limited, it is not as stable as I would like (Just like most I2V models built on Stable-Diffusion-v1-5, they generally not very stable for generation).
We split the animatelcm_sd15 and animatelcm_svd into two folders. They are based on different environments. Please refer to README_animatelcm_sd15 and README_animatelcm_svd for instructions.
AnimateLCM-T2V:
AnimateLCM-I2V:
2-4 steps should work for personalized image animation.
In most cases, the model does not need CFG values. Just set the CFG=1 to reduce inference cost.
I additionally set a motion scale
hyper-parameter. Set it to 0.8 as the default choice. If you set it to 0.0, you should always obtain static animations. You can increase the motion scale for larger motions, but that will sometimes cause generation failure.
The typical workflow can be:
AnimateLCM-SVD:
CFG_min
and CFG_max
. By default, CFG_min
is set to 1. Slightly adjusting CFG_max
between [1, 1.5] will obtain good results. Again, just setting it to 1 to reduce the inference cost.Screen recording of AnimateLCM-T2V. Prompt: "dog with sunglasses".
I am open to collaboration, but not to a full-time intern. If you find some of my work interesting and hope for collaboration/discussion in any format, please do not hesitate to contact me.
📧 Email: fywang@link.cuhk.edu.hk
I would thank AK for broadcasting our work and the hugging face team for providing help in building the gradio demo and storing the models. Would thank the Dhruv Nair for providing help in diffusers.