CV-VAE: A Compatible Video VAE for Latent Generative Video Models

[Sijie Zhao](https://scholar.google.com/citations?user=tZ3dS3MAAAAJ) · [Yong Zhang*](https://yzhang2016.github.io/) · [Xiaodong Cun](https://vinthony.github.io/academic/) · [Shaoshu Yang]() · [Muyao Niu]() [Xiaoyu Li](https://xiaoyu258.github.io/) · [Wenbo Hu](https://wbhu.github.io/) · [Ying Shan](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en) ^*Corresponding Authors

TL; DR: A video VAE for latent generative video models, which is compatible with pretrained image and video models, e.g., SD 2.1 and SVD

News

[x] 2024-10-14 :hugs: We have updated the training code of CV-VAE.
[x] 2024-10-14 We have released the inference code and model weights of CV-VAE-SD3.
[x] 2024-10-14 We have updated the CV-VAE with better performance, please check cv-vae-v1-1 of CV-VAE-SD3.
[x] 2024-09-25 CV-VAE is accepted by NeurIPS 2024.
[x] 2024-06-03 We have released the inference code and model weights of CV-VAE.
[x] 2024-05-30 We have updated the arXiv preprint.

Usage

Dependencies

Python >= 3.8 (Recommend to use Anaconda)
PyTorch >= 1.13.0
NVIDIA GPU + CUDA

Video reconstruction

Download the model weight from Hugging Face

python3 cvvae_inference_video.py \
  --vae_path MODEL_PATH \
  --video_path INPUT_VIDEO_PATH \
  --save_path VIDEO_SAVE_PATH \
  --height HEIGHT \
  --width WIDTH

😉 Citation

@article{zhao2024cvvae,
  title={CV-VAE: A Compatible Video VAE for Latent Generative Video Models},
  author={Zhao, Sijie and Zhang, Yong and Cun, Xiaodong and Yang, Shaoshu and Niu, Muyao and Li, Xiaoyu and Hu, Wenbo and Shan, Ying},
  journal={https://arxiv.org/abs/2405.20279},
  year={2024}
}