DoHunLee1 / VideoGuide

Official repository for "VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide"
18 stars 2 forks source link

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

This repository is the official implementation of VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide, led by

Dohun Lee*, Bryan S Kim*, Geon Yeong Park, Jong Chul Ye

main figure

Project Website arXiv


πŸ”₯ Summary

VideoGuide πŸš€ enhances temporal quality in video diffusion models without additional training or fine-tuning by leveraging a pretrained model as a guide. During inference, it uses a guiding model to provide a temporally consistent sample, which is interpolated with the sampling model's output to improve consistency. VideoGuide shows the following advantages:

  1. Improved temporal consistency with preserved imaging quality and motion smoothness
  2. Fast inference as application only to early steps is proved sufficient
  3. Prior distillation of the guiding model

πŸ—“ ️News

πŸ› οΈ Setup

First, create your environment. We recommend using the following comments.

git clone https://github.com/DoHunLee1/VideoGuide.git
cd VideoGuide

conda create -n videoguide python=3.10
conda activate videoguide
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu118

⏳ Models

Models Checkpoints
VideoCrafter2 Hugging Face
AnimateDiff Hugging Face
RealisticVision Hugging Face
Stable Diffusion v1.5 Hugging Face

Please refer to the official repositories of AnimateDiff and VideoCrafter for detailed explanation and setup guide for each model. We thank them for sharing their impressive work!

πŸŒ„ Example

An example of using VideoGuide is provided in the inference.sh code.

πŸ“ Citation

If you find our method useful, please cite as below or leave a star to this repository.

@misc{lee2024videoguideimprovingvideodiffusion,
  title={VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide}, 
  author={Dohun Lee and Bryan S Kim and Geon Yeong Park and Jong Chul Ye},
  year={2024},
  eprint={2410.04364},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2410.04364}, 
}

πŸ€— Acknowledgements

We thank the authors of AnimateDiff, VideoCrafter, Stable Diffusion for sharing their awesome work. We also thank the CivitAI community for sharing their impressive T2I models!

[!note] This work is currently in the preprint stage, and there may be some changes to the code.