MyNiuuu / MOFA-Video

[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
https://myniuuu.github.io/MOFA_Video
Other
632 stars 36 forks source link
aigc controllable-generation diffusion-models eccv2024 generative-ai generative-models image2video video-diffusion-model

🦄️ MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model (ECCV 2024)

Muyao Niu 1,2   Xiaodong Cun2,*   Xintao Wang2   Yong Zhang2   Ying Shan2   Yinqiang Zheng1,*  
1 The University of Tokyo   2 Tencent AI Lab   * Corresponding Author  

In European Conference on Computer Vision (ECCV) 2024

     

🔥🔥🔥 New Features/Updates

📰 CODE RELEASE

TL;DR

Image 🏞️ + Hybrid Controls 🕹️ = Videos 🎬🍿




Trajectory + Landmark Control




Trajectory Control





Landmark Control
Check the gallery of our project page for more visual results!

Introduction

We introduce MOFA-Video, a method designed to adapt motions from different domains to the frozen Video Diffusion Model. By employing sparse-to-dense (S2D) motion generation and flow-based motion adaptation, MOFA-Video can effectively animate a single image using various types of control signals, including trajectories, keypoint sequences, AND their combinations.

During the training stage, we generate sparse control signals through sparse motion sampling and then train different MOFA-Adapters to generate video via pre-trained SVD. During the inference stage, different MOFA-Adapters can be combined to jointly control the frozen SVD.

🕹️ Image Animation with Hybrid Controls

1. Clone the Repository

git clone https://github.com/MyNiuuu/MOFA-Video.git
cd ./MOFA-Video

2. Environment Setup

The demo has been tested on CUDA version of 11.7.

cd ./MOFA-Video-Hybrid
conda create -n mofa python==3.10
conda activate mofa
pip install -r requirements.txt
pip install opencv-python-headless
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

IMPORTANT: ⚠️⚠️⚠️ Gradio Version of 4.5.0 in the requirements.txt should be strictly followed since other versions may cause errors.

3. Downloading Checkpoints

  1. Download the checkpoint of CMP from here and put it into ./MOFA-Video-Hybrid/models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints.

  2. Download the ckpts folder from the huggingface repo which contains necessary pretrained checkpoints and put it under ./MOFA-Video-Hybrid. You may use git lfs to download the entire ckpts folder:

    1) Download git lfs from https://git-lfs.github.com. It is commonly used for cloning repositories with large model checkpoints on HuggingFace. 2) Execute git clone https://huggingface.co/MyNiuuu/MOFA-Video-Hybrid to download the complete HuggingFace repository, which currently only includes the ckpts folder. 3) Copy or move the ckpts folder to the GitHub repository.

    NOTE: If you encounter the error git: 'lfs' is not a git command on Linux, you can try this solution that has worked well for my case.

    Finally, the checkpoints should be orgnized as ./MOFA-Video-Hybrid/ckpt_tree.md.

4. Run Gradio Demo

Using audio to animate the facial part

cd ./MOFA-Video-Hybrid
python run_gradio_audio_driven.py

🪄🪄🪄 The Gradio Interface is displayed as below. Please refer to the instructions on the gradio interface during the inference process!

Using reference video to animate the facial part

cd ./MOFA-Video-Hybrid
python run_gradio_video_driven.py

🪄🪄🪄 The Gradio Interface is displayed as below. Please refer to the instructions on the gradio interface during the inference process!

💫 Trajectory-based Image Animation

Please refer to Here for instructions.

Training your own MOFA-Adapter

Please refer to Here for more instructions.

Citation

@article{niu2024mofa,
  title={MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model},
  author={Niu, Muyao and Cun, Xiaodong and Wang, Xintao and Zhang, Yong and Shan, Ying and Zheng, Yinqiang},
  journal={arXiv preprint arXiv:2405.20222},
  year={2024}
}

Acknowledgements

We sincerely appreciate the code release of the following projects: DragNUWA, SadTalker, AniPortrait, Diffusers, SVD_Xtend, Conditional-Motion-Propagation, and Unimatch.