G-U-N / Motion-I2V

[SIGGRAPH 2024] Motion I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
https://xiaoyushi97.github.io/Motion-I2V/
115 stars 11 forks source link
## Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling by *Xiaoyu Shi1\*, Zhaoyang Huang1\*, Fu-Yun Wang1\*, Weikang Bian1\*, Dasong Li 1, Yi Zhang1, Manyuan Zhang1, Ka Chun Cheung2, Simon See2, Hongwei Qin3, Jifeng Dai4, Hongsheng Li1* *1CUHK-MMLab 2NVIDIA 3SenseTime 4 Tsinghua University*
@article{shi2024motion,
            title={Motion-i2v: Consistent and controllable image-to-video generation with explicit motion modeling},
            author={Shi, Xiaoyu and Huang, Zhaoyang and Wang, Fu-Yun and Bian, Weikang and Li, Dasong and Zhang, Yi and Zhang, Manyuan and Cheung, Ka Chun and See, Simon and Qin, Hongwei and others},
            journal={SIGGRAPH 2024},
            year={2024}
            }
}
Image description Overview of Motion-I2V. The first stage of Motion-I2V targets at deducing the motions that can plausibly animate the reference image. It is conditioned on the reference image and text prompt, and predicts the motion field maps between the reference frame and all the future frames. The second stage propagates reference image’s content to synthesize frames. A novel motion-augmented temporal layer enhances 1-D temporal attention with warped features. This operation enlarges the temporal receptive field and alleviates the complexity of directly learning the complicated spatial-temporal patterns.

Usage

  1. Install environments
    conda env create -f environment.yaml
  2. Download models
    git clone https://huggingface.co/wangfuyun/Motion-I2V
  3. Run the code
    python -m scripts.app 

ComfyUI

ComfyUI-IG-Motion-I2V

arch