MyNiuuu / MOFA-Video

Official Pytorch implementation for MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
https://myniuuu.github.io/MOFA_Video
Other
358 stars 22 forks source link

πŸ¦„οΈ MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Muyao Niu 1,2   Xiaodong Cun2,*   Xintao Wang2   Yong Zhang2   Ying Shan2   Yinqiang Zheng1,*  
1 The University of Tokyo   2 Tencent AI Lab   * Corresponding Author  


     

πŸ”₯πŸ”₯πŸ”₯ New Features/Updates

We have released the Gradio inference code and the checkpoints for Hybrid Controls! Please refer to Here for more instructions.

Stay tuned. Feel free to raise issues for bug reports or any questions!

Free online demo via HuggingFace Spaces will be coming soon!

If you find this work interesting, please do not hesitate to give a ⭐!

πŸ“° CODE RELEASE

TL;DR

Image 🏞️ + Hybrid Controls πŸ•ΉοΈ = Videos 🎬🍿




Trajectory + Landmark Control




Trajectory Control





Landmark Control
Check the gallery of our project page for more visual results!

Introduction

We introduce MOFA-Video, a method designed to adapt motions from different domains to the frozen Video Diffusion Model. By employing sparse-to-dense (S2D) motion generation and flow-based motion adaptation, MOFA-Video can effectively animate a single image using various types of control signals, including trajectories, keypoint sequences, AND their combinations.

During the training stage, we generate sparse control signals through sparse motion sampling and then train different MOFA-Adapters to generate video via pre-trained SVD. During the inference stage, different MOFA-Adapters can be combined to jointly control the frozen SVD.

πŸ•ΉοΈ Image Animation with Hybrid Controls

1. Clone the Repository

git clone https://github.com/MyNiuuu/MOFA-Video.git
cd ./MOFA-Video

2. Environment Setup

The demo has been tested on CUDA version of 11.7.

cd ./MOFA-Video-Hybrid
conda create -n mofa python==3.10
conda activate mofa
pip install -r requirements.txt
pip install opencv-python-headless
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

IMPORTANT: ⚠️⚠️⚠️ Gradio Version of 4.5.0 in the requirements.txt should be strictly followed since other versions may cause errors.

3. Downloading Checkpoints

  1. Download the checkpoint of CMP from here and put it into ./MOFA-Video-Hybrid/models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints.

  2. Download the ckpts folder from the huggingface repo which contains necessary pretrained checkpoints and put it under ./MOFA-Video-Hybrid. You may use git lfs to download the entire ckpts folder:

    1) Download git lfs from https://git-lfs.github.com. It is commonly used for cloning repositories with large model checkpoints on HuggingFace. 2) Execute git clone https://huggingface.co/MyNiuuu/MOFA-Video-Hybrid to download the complete HuggingFace repository, which currently only includes the ckpts folder. 3) Copy or move the ckpts folder to the GitHub repository.

    NOTE: If you encounter the error git: 'lfs' is not a git command on Linux, you can try this solution that has worked well for my case.

    Finally, the checkpoints should be orgnized as ./MOFA-Video-Hybrid/ckpt_tree.md.

4. Run Gradio Demo

Using audio to animate the facial part

cd ./MOFA-Video-Hybrid
python run_gradio_audio_driven.py

πŸͺ„πŸͺ„πŸͺ„ The Gradio Interface is displayed as below. Please refer to the instructions on the gradio interface during the inference process!

Using reference video to animate the facial part

cd ./MOFA-Video-Hybrid
python run_gradio_video_driven.py

πŸͺ„πŸͺ„πŸͺ„ The Gradio Interface is displayed as below. Please refer to the instructions on the gradio interface during the inference process!

πŸ’« Trajectory-based Image Animation

Please refer to Here for instructions.

Citation

@article{niu2024mofa,
  title={MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model},
  author={Niu, Muyao and Cun, Xiaodong and Wang, Xintao and Zhang, Yong and Shan, Ying and Zheng, Yinqiang},
  journal={arXiv preprint arXiv:2405.20222},
  year={2024}
}

Acknowledgements

We sincerely appreciate the code release of the following projects: DragNUWA, SadTalker, AniPortrait, Diffusers, SVD_Xtend, Conditional-Motion-Propagation, and Unimatch.