Vchitect / VEnhancer

Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
https://vchitect.github.io/VEnhancer-project/
383 stars 22 forks source link
aigc-enhancement diffusion-models frame-interpolation text-to-video video-enhancement video-generation video-super-resolution video-to-video

VEnhancer: Generative Space-Time Enhancement
for Video Generation

Jingwen He,  Tianfan Xue,  Dongyang Liu,  Xinqi Lin, 
Peng Gao,  Dahua Lin,  Yu Qiao,  Wanli Ouyang,  Ziwei Liu
The Chinese University of Hong Kong, Shanghai Artificial Intelligence Laboratory, 
S-Lab, Nanyang Technological University 

VEnhancer, an All-in-One generative video enhancement model that can achieve spatial super-resolution, temporal super-resolution, and video refinement for AI-generated videos.
AIGC video +VEnhancer
:open_book: For more visual results, go checkout our
project page ---

๐Ÿ”ฅ Update

:astonished: Gallery

Inputs & Results Model Version
Prompt: A close-up shot of a woman standing in a dimly lit room. she is wearing a traditional chinese outfit, which includes a red and gold dress with intricate designs and a matching headpiece.
from Open-Sora
v2
Prompt: Einstein plays guitar.
from Kling
v2
Prompt: A girl eating noodles.
from Kling
v2
Prompt: A little brick man visiting an art gallery.

from Kling
v1
<!-- Prompt: A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea.
from CogVideoX
v2
-->

๐ŸŽฌ Overview

VEnhancer achieves spatial super-resolution, temporal super-resolution (i.e, frame interpolation), and video refinement in one model. It is flexible to adapt to different upsampling factors (e.g., 1x~8x) for either spatial or temporal super-resolution. Besides, it provides flexible control to modify the refinement strength for handling diversified video artifacts.

It follows ControlNet and copies the architecures and weights of multi-frame encoder and middle block of a pretrained video diffusion model to build a trainable condition network. This video ControlNet accepts both low-resolution key frames and full frames of noisy latents as inputs. Also, the noise level $\sigma$ regarding noise augmentation and downscaling factor $s$ serve as additional network conditioning through our proposed video-aware conditioning apart from timestep $t$ and prompt $c_{text}$.

:gear: Installation

# clone this repo
git clone https://github.com/Vchitect/VEnhancer.git
cd VEnhancer

# create environment
conda create -n venhancer python=3.10
conda activate venhancer
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt

Note that ffmpeg command should be enabled. If you have sudo access, then you can install it using the following command:

sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

:dna: Pretrained Models

Model Name Description HuggingFace BaiduNetdisk
venhancer_paper.pth very creative, strong refinement, but sometimes over-smooths edges and texture details. download download
venhancer_v2.pth less creative, but can generate better texture details, and has better identity preservation. download download

๐Ÿ’ซ Inference

1) Download the VEnhancer model and then put the checkpoint in the VEnhancer/ckpts directory. (optional as it can be done automatically) 2) run the following command.

  bash run_VEnhancer.sh

for single GPU inference (at least A100 80G is required), or

  bash run_VEnhancer_MultiGPU.sh

for multiple GPU inference.

In run_VEnhancer.sh or run_VEnhancer_MultiGPU.sh,

Gradio

The same functionality is also available as a gradio demo. Please follow the previous guidelines, and specify the model version (v1 or v2).

python gradio_app.py --version v1

BibTeX

If you use our work in your research, please cite our publication:

@article{he2024venhancer,
  title={VEnhancer: Generative Space-Time Enhancement for Video Generation},
  author={He, Jingwen and Xue, Tianfan and Liu, Dongyang and Lin, Xinqi and Gao, Peng and Lin, Dahua and Qiao, Yu and Ouyang, Wanli and Liu, Ziwei},
  journal={arXiv preprint arXiv:2407.07667},
  year={2024}
}

๐Ÿค— Acknowledgements

Our codebase builds on modelscope. Thanks the authors for sharing their awesome codebases!

๐Ÿ“ง Contact

If you have any questions, please feel free to reach us at hejingwenhejingwen@outlook.com.