GongyeLiu / StyleCrafter

[SIGGRAPH Asia 2024 (Journal Track)]StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
https://gongyeliu.github.io/StyleCrafter.github.io/
Apache License 2.0
193 stars 15 forks source link

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

πŸ”₯πŸ”₯πŸ”₯ StyleCrafter on SDXL for stylized image generation is available! Enabling higher resolution(1024Γ—1024) and more visually pleasing!

                 
            _**[GongyeLiu](https://github.com/GongyeLiu), [Menghan Xia*](https://menghanxia.github.io/), [Yong Zhang](https://yzhang2016.github.io), [Haoxin Chen](https://scholar.google.com/citations?user=6UPJSvwAAAAJ&hl=zh-CN&oi=ao), [Jinbo Xing](https://doubiiu.github.io/),
[Xintao Wang](https://xinntao.github.io/), [Yujiu Yang*](https://scholar.google.com/citations?user=4gH3sxsAAAAJ&hl=zh-CN&oi=ao), [Ying Shan](https://scholar.google.com/citations?hl=en&user=4oXBp9UAAAAJ&view_op=list_works&sortby=pubdate)**_

(* corresponding authors) From Tsinghua University and Tencent AI Lab.

πŸ”† Introduction

TL;DR: We propose StyleCrafter, a generic method that enhances pre-trained T2V models with style control, supporting Style-Guided Text-to-Image Generation and Style-Guided Text-to-Video Generation.

1. ⭐⭐ Style-Guided Text-to-Video Generation.

Style-guided text-to-video results. Resolution: 320 x 512; Frames: 16. (Compressed)

2. Style-Guided Text-to-Image Generation.

Style-guided text-to-image results. Resolution: 512 x 512. (Compressed)

πŸ“ Changelog

🧰 Models

Base Model Gen Type Resolution Checkpoint How to run
VideoCrafter Image/Video 320x512 Hugging Face StyleCrafter on VideoCrafter
SDXL Image 1024x1024 Hugging Face StyleCrafter on SDXL

It takes approximately 5 seconds to generate a 512Γ—512 image and 85 seconds to generate a 320Γ—512 video with 16 frames using a single NVIDIA A100 (40G) GPU. A GPU with at least 16G GPU memory is required to perform the inference process.

βš™οΈ Setup

conda create -n stylecrafter python=3.8.5
conda activate stylecrafter
pip install -r requirements.txt

πŸ’« Inference

1) Download all checkpoints according to the instructions 2) Run the commands in terminal.

# style-guided text-to-image generation
sh scripts/run_infer_image.sh

# style-guided text-to-video generation
sh scripts/run_infer_video.sh

3) (Optional) Infernce on your own data according to the instructions

πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ Crafter Family

VideoCrafter1: Framework for high-quality text-to-video generation.

ScaleCrafter: Tuning-free method for high-resolution image/video generation.

TaleCrafter: An interactive story visualization tool that supports multiple characters.

LongerCrafter: Tuning-free method for longer high-quality video generation.

DynamiCrafter Animate open-domain still images to high-quality videos.

πŸ“’ Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.


πŸ™ Acknowledgements

We would like to thank AK(@_akhaliq) for the help of setting up online demo.

πŸ“­ Contact

If your have any comments or questions, feel free to contact lgy22@mails.tsinghua.edu.cn

BibTex

@article{liu2023stylecrafter,
  title={StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter},
  author={Liu, Gongye and Xia, Menghan and Zhang, Yong and Chen, Haoxin and Xing, Jinbo and Wang, Xintao and Yang, Yujiu and Shan, Ying},
  journal={arXiv preprint arXiv:2312.00330},
  year={2023}
}