In European Conference on Computer Vision (ECCV) 2024
(2024.08.07) We have released the inference script for keypoint-based facial image animation! Please refer to Here for more instructions.
(2024.07.15) We have released the training code for trajectory-based image animation! Please refer to Here for more instructions.
MOFA-Video will be appeared in ECCV 2024! 🇮🇹🇮🇹🇮🇹
We have released the Gradio inference code and the checkpoints for Hybrid Controls! Please refer to Here for more instructions.
Free online demo via HuggingFace Spaces will be coming soon!
If you find this work interesting, please do not hesitate to give a ⭐!
|
|
|
Trajectory + Landmark Control |
|
|
|
|
Trajectory Control |
|
|
|
|
|
Landmark Control |
We introduce MOFA-Video, a method designed to adapt motions from different domains to the frozen Video Diffusion Model. By employing sparse-to-dense (S2D) motion generation and flow-based motion adaptation, MOFA-Video can effectively animate a single image using various types of control signals, including trajectories, keypoint sequences, AND their combinations.
During the training stage, we generate sparse control signals through sparse motion sampling and then train different MOFA-Adapters to generate video via pre-trained SVD. During the inference stage, different MOFA-Adapters can be combined to jointly control the frozen SVD.
git clone https://github.com/MyNiuuu/MOFA-Video.git
cd ./MOFA-Video
The demo has been tested on CUDA version of 11.7.
cd ./MOFA-Video-Hybrid
conda create -n mofa python==3.10
conda activate mofa
pip install -r requirements.txt
pip install opencv-python-headless
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
IMPORTANT: ⚠️⚠️⚠️ Gradio Version of 4.5.0 in the requirements.txt
should be strictly followed since other versions may cause errors.
Download the checkpoint of CMP from here and put it into ./MOFA-Video-Hybrid/models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints
.
Download the ckpts
folder from the huggingface repo which contains necessary pretrained checkpoints and put it under ./MOFA-Video-Hybrid
. You may use git lfs
to download the entire ckpts
folder:
1) Download git lfs
from https://git-lfs.github.com. It is commonly used for cloning repositories with large model checkpoints on HuggingFace.
2) Execute git clone https://huggingface.co/MyNiuuu/MOFA-Video-Hybrid
to download the complete HuggingFace repository, which currently only includes the ckpts
folder.
3) Copy or move the ckpts
folder to the GitHub repository.
NOTE: If you encounter the error git: 'lfs' is not a git command
on Linux, you can try this solution that has worked well for my case.
Finally, the checkpoints should be orgnized as ./MOFA-Video-Hybrid/ckpt_tree.md
.
Using audio to animate the facial part
cd ./MOFA-Video-Hybrid
python run_gradio_audio_driven.py
🪄🪄🪄 The Gradio Interface is displayed as below. Please refer to the instructions on the gradio interface during the inference process!
Using reference video to animate the facial part
cd ./MOFA-Video-Hybrid
python run_gradio_video_driven.py
🪄🪄🪄 The Gradio Interface is displayed as below. Please refer to the instructions on the gradio interface during the inference process!
Please refer to Here for instructions.
Please refer to Here for more instructions.
@article{niu2024mofa,
title={MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model},
author={Niu, Muyao and Cun, Xiaodong and Wang, Xintao and Zhang, Yong and Shan, Ying and Zheng, Yinqiang},
journal={arXiv preprint arXiv:2405.20222},
year={2024}
}
We sincerely appreciate the code release of the following projects: DragNUWA, SadTalker, AniPortrait, Diffusers, SVD_Xtend, Conditional-Motion-Propagation, and Unimatch.