This repository is the official implementation of VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide, led by
Dohun Lee*, Bryan S Kim*, Geon Yeong Park, Jong Chul Ye
VideoGuide π enhances temporal quality in video diffusion models without additional training or fine-tuning by leveraging a pretrained model as a guide. During inference, it uses a guiding model to provide a temporally consistent sample, which is interpolated with the sampling model's output to improve consistency. VideoGuide shows the following advantages:
First, create your environment. We recommend using the following comments.
git clone https://github.com/DoHunLee1/VideoGuide.git
cd VideoGuide
conda create -n videoguide python=3.10
conda activate videoguide
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu118
Models | Checkpoints |
---|---|
VideoCrafter2 | Hugging Face |
AnimateDiff | Hugging Face |
RealisticVision | Hugging Face |
Stable Diffusion v1.5 | Hugging Face |
Please refer to the official repositories of AnimateDiff and VideoCrafter for detailed explanation and setup guide for each model. We thank them for sharing their impressive work!
An example of using VideoGuide is provided in the inference.sh code.
If you find our method useful, please cite as below or leave a star to this repository.
@misc{lee2024videoguideimprovingvideodiffusion,
title={VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide},
author={Dohun Lee and Bryan S Kim and Geon Yeong Park and Jong Chul Ye},
year={2024},
eprint={2410.04364},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.04364},
}
We thank the authors of AnimateDiff, VideoCrafter, Stable Diffusion for sharing their awesome work. We also thank the CivitAI community for sharing their impressive T2I models!
[!note] This work is currently in the preprint stage, and there may be some changes to the code.