Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe, James M. Rehg, Pinar Yanardag
(Note that the videos on GitHub are heavily compressed. The full videos are available on the project webpage.)
TL; DR: RAVE is a zero-shot, lightweight, and fast framework for text-guided video editing, supporting videos of any length utilizing text-to-image pretrained diffusion models.
Features:
Please install our environment using 'requirements.txt' file as:
conda create -n rave python=3.8
conda activate rave
conda install pip
pip cache purge
pip install -r requirements.txt
Also, please install PyTorch and Xformers as
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install xformers==0.0.20
to set up the Conda environment.
Our code was tested on Linux with the following versions:
timm==0.6.7 torch==2.0.1+cu118 xformers==0.0.20 diffusers==0.18.2 torch.version.cuda==11.8 python==3.8.0
To run our grad.io based web demo, run the following command:
python webui.py
Then, specify your configurations and perform editing.
To run RAVE, please follow these steps:
1- Put the video you want to edit under data/mp4_videos
as an MP4 file. Note that we suggest using videos with a size of 512x512 or 512x320.
2- Prepare a config file under the configs
directory. Change the name of the video_name
parameter to the name of the MP4 file. You can find detailed descriptions of the parameters and example configurations there.
3- Run the following command:
python scripts/run_experiment.py [PATH OF CONFIG FILE]
4- The results will be generated under the results
directory. Also, the latents and controls are saved under the generated
directory to speed up the editing with different prompts on the same video.
Note that the names of the preprocessors available can be found in utils/constants.py
.
Our code allows to run any customized model from CIVIT AI. To use these models, please follow the steps:
1- Determine which model you want to use from CIVIT AI, and obtain its index. (e.g. the index for RealisticVision V5.1 is 130072, you can find the id of the model in the website link as a parameter assigned to 'VersionId', e.g. https://civitai.com/models/4201?modelVersionId=130072)
2- In the current directory, run the following code. It downloads the model in safetensors format, and converts it to '.bin' format that is compatible with diffusers.
bash CIVIT_AI/civit_ai.sh 130072
3- Copy the path of the converted model, $CWD/CIVIT_AI/diffusers_models/[CUSTOMIZED MODEL]
(e.g. CIVIT_AI/diffusers_models/realisticVisionV60B1_v51VAE
for 130072), and use the path in the config file.
Dataset will be released soon.
1- Local Editing | 2- Visual Style Editing | 3- Background Editing |
4- Shape/Attribute Editing | 5- Extreme Shape Editing |
1- Exo-motion | 2- Ego-motion | 3- Ego-exo motion |
4- Occlusions | 5- Multiple objects with appearance/disappearance |
@inproceedings{kara2024rave,
title={RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models},
author={Ozgur Kara and Bariscan Kurtkaya and Hidir Yesiltepe and James M. Rehg and Pinar Yanardag},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
This is the official repository for RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models. Feel free to contact for any questions or discussions Ozgur Kara.