This repository contains the implementation of the following paper:
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu
From MMLab@NTU affiliated with S-Lab, Nanyang Technological University
We propose FreeInit, a concise yet effective method to improve temporal consistency of videos generated by diffusion models. FreeInit requires no additional training and introduces no learnable parameters, and can be easily incorporated into arbitrary video diffusion models at inference time.
In this repository, we use AnimateDiff as an example to demonstrate how to integrate FreeInit into current text-to-video inference pipelines.
In pipeline_animation.py, we define a class AnimationFreeInitPipeline
inherited from AnimationPipeline
, showing how to modify the original pipeline.
In freeinit_utils.py, we provide frequency filtering code for Noise Reinitialization.
An example inference script is provided at animate_with_freeinit.py.
Please refer to the above scripts as a reference when integrating FreeInit into other video diffusion models.
git clone https://github.com/TianxingWu/FreeInit.git
cd FreeInit
cd examples/AnimateDiff
conda env create -f environment.yaml
conda activate animatediff
Please refer to the official repo of AnimateDiff. The setup guide is listed here.
After downloading the base model, motion module and personalize T2I checkpoints, run the following command to generate animations with FreeInit. The generation results is then saved to outputs
folder.
python -m scripts.animate_with_freeinit \
--config "configs/prompts/freeinit_examples/RealisticVision_v2.yaml" \
--num_iters 5 \
--save_intermediate \
--use_fp16
where num_iters
is the number of freeinit iterations. We recommend to use 3-5 iterations for a balance between the quality and efficiency. For faster inference, the argument use_fast_sampling
can be enabled to use the Coarse-to-Fine Sampling strategy, which may lead to inferior results.
You can change the text prompts in the config file. To tune the frequency filter parameters for better results, please change the filter_params
settings in the config file. The 'butterworth'
filter with n=4, d_s=d_t=0.25
is set as default. For base models with larger temporal inconsistencies, please consider using the 'guassian'
filter.
More .yaml
files with different motion module / personalize T2I settings are also provided for testing.
We also provide a Gradio Demo to demonstrate our method with UI. Running the following command will launch the demo. Feel free to play around with the parameters to improve generation quality.
python app.py
Alternatively, you can try the online demo hosted on Hugging Face: [demo link] .
Please refer to our project page for more visual comparisons.
If you find our repo useful for your research, please consider citing our paper:
@article{wu2023freeinit,
title={FreeInit: Bridging Initialization Gap in Video Diffusion Models},
author={Wu, Tianxing and Si, Chenyang and Jiang, Yuming and Huang, Ziqi and Liu, Ziwei},
journal={arXiv preprint arXiv:2312.07537},
year={2023}
This project is distributed under the MIT License. See LICENSE
for more information.
The example code is built upon AnimateDiff. Thanks to the team for their impressive work!