Dai-Wenxun / MotionLCM

[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model"
Other
209 stars 5 forks source link

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Wenxun Dai😎, Ling-Hao Chen😎, Jingbo WangπŸ₯³, Jinpeng Liu😎, Bo DaiπŸ₯³, Yansong Tang😎

😎Tsinghua University, πŸ₯³Shanghai AI Laboratory (Correspondence: Jingbo Wang and Bo Dai).

🀩 Abstract

This work introduces MotionLCM, extending controllable motion generation to a real-time level. Existing methods for spatial control in text-conditioned motion generation suffer from significant runtime inefficiency. To address this issue, we first propose the motion latent consistency model (MotionLCM) for motion generation, building upon the latent diffusion model (MLD). By employing one-step (or few-step) inference, we further improve the runtime efficiency of the motion latent diffusion model for motion generation. To ensure effective controllability, we incorporate a motion ControlNet within the latent space of MotionLCM and enable explicit control signals (e.g., pelvis trajectory) in the vanilla motion space to control the generation process directly, similar to controlling other latent-free diffusion models for motion generation. By employing these techniques, our approach can generate human motions with text and control signals in real-time. Experimental results demonstrate the remarkable generation and controlling capabilities of MotionLCM while maintaining real-time runtime efficiency.

πŸ“’ News

πŸ‘¨β€πŸ« Quick Start

This section provides a quick start guide to set up the environment and run the demo. The following steps will guide you through the installation of the required dependencies, downloading the pretrained models, and preparing the datasets.

1. Conda environment ``` conda create python=3.10 --name motionlcm conda activate motionlcm ``` Install the packages in `requirements.txt` and install [PyTorch 1.13.1](https://pytorch.org/). ``` pip install -r requirements.txt ``` We test our code on Python 3.10.12 and PyTorch 1.13.1.
2. Dependencies If you have the `sudo` permission, install `ffmpeg` for visualizing stick figure (if not already installed): ``` sudo apt update sudo apt install ffmpeg ffmpeg -version # check! ``` If you do not have the `sudo` permission to install it, please install it via `conda`: ``` conda install conda-forge::ffmpeg ffmpeg -version # check! ``` Run the following command to install [`git-lfs`](https://git-lfs.com/): ``` conda install conda-forge::git-lfs ``` Run the script to download dependencies materials: ``` bash prepare/download_glove.sh bash prepare/download_t2m_evaluators.sh bash prepare/prepare_t5.sh bash prepare/download_smpl_models.sh ```
3. Pretrained models Run the script to download the pretrained models: ``` bash prepare/download_pretrained_models.sh ``` The folders `experiments_t2m` and `experiments_control` store pretrained models for text-to-motion and motion control respectively.
4. (Optional) Download manually Visit the [Google Driver](https://drive.google.com/drive/folders/1SIhb6srXWS0PNvZ2fs40QiE3Rk764u6z?usp=sharing) to download the previous dependencies and models.
5. Prepare the datasets Please refer to [HumanML3D](https://github.com/EricGuo5513/HumanML3D) for text-to-motion dataset setup. Copy the result dataset to our repository: ``` cp -r ../HumanML3D/HumanML3D ./datasets/humanml3d ``` The unofficial method of data preparation can be found in this [issue](https://github.com/Dai-Wenxun/MotionLCM/issues/6).
6. Folder Structure After the whole setup pipeline, the folder structure will look like: ``` MotionLCM β”œβ”€β”€ configs β”œβ”€β”€ datasets β”‚ β”œβ”€β”€ humanml3d β”‚ β”‚ β”œβ”€β”€ new_joint_vecs β”‚ β”‚ β”œβ”€β”€ new_joints β”‚ β”‚ β”œβ”€β”€ texts β”‚ β”‚ β”œβ”€β”€ Mean.npy β”‚ β”‚ β”œβ”€β”€ Std.npy β”‚ β”‚ β”œβ”€β”€ ... β”‚ └── humanml_spatial_norm β”‚ β”œβ”€β”€ Mean_raw.npy β”‚ └── Std_raw.npy β”œβ”€β”€ deps β”‚ β”œβ”€β”€ glove β”‚ β”œβ”€β”€ sentence-t5-large | β”œβ”€β”€ smpl_models β”‚ └── t2m β”œβ”€β”€ experiments_control β”‚ β”œβ”€β”€ mld_humanml β”‚ β”‚ └── mld_humanml.ckpt β”‚ └── motionlcm_humanml β”‚ └── motionlcm_humanml.ckpt β”œβ”€β”€ experiments_t2m β”‚ β”œβ”€β”€ mld_humanml β”‚ β”‚ └── mld_humanml.ckpt β”‚ └── motionlcm_humanml β”‚ └── motionlcm_humanml.ckpt β”œβ”€β”€ ... ```

🎬 Demo

MotionLCM provides two main functionalities: text-to-motion and motion control. The following commands demonstrate how to use the pretrained models to generate motions. The outputs will be stored in ${cfg.TEST_FOLDER} / ${cfg.NAME} / demo_${timestamp} (experiments_t2m_test/motionlcm_humanml/demo_2024-04-06T23-05-07).

1. Text-to-Motion (using provided prompts and lengths in `demo/example.txt`) If you haven't completed `5. Prepare the datasets` in `πŸ‘¨β€πŸ« Quick Start`, make sure to use the following command to download a tiny humanml3d dataset. The model instantiation requires the motion feature dimension of the dataset. ``` bash prepare/prepare_tiny_humanml3d.sh ``` Then, run the demo: ``` python demo.py --cfg configs/motionlcm_t2m.yaml --example assets/example.txt ```
2. Text-to-Motion (using prompts from HumanML3D test set) ``` python demo.py --cfg configs/motionlcm_t2m.yaml ```
3. Motion Control (using prompts and trajectory from HumanML3D test set) ``` python demo.py --cfg configs/motionlcm_control.yaml ```
4. Render SMPL After running the demo, the output folder will store the stick figure animation for each generated motion (e.g., `assets/example.gif`). ![example](assets/example.gif) To record the necessary information about the generated motion, a pickle file with the following keys will be saved simultaneously (e.g., `assets/example.pkl`): - `joints (numpy.ndarray)`: The XYZ positions of the generated motion with the shape of `(nframes, njoints, 3)`. - `text (str)`: The text prompt. - `length (int)`: The length of the generated motion. - `hint (numpy.ndarray)`: The trajectory for motion control (optional).
4.1 Create SMPL meshes To create SMPL meshes for a specific pickle file, let's use `assets/example.pkl` as an example: ``` python fit.py --pkl assets/example.pkl ``` The SMPL meshes (numpy array) will be stored in `assets/example_mesh.pkl` with the shape `(nframes, 6890, 3)`. You can also fit all pickle files within a folder. The code will traverse all `.pkl` files in the directory and filter out files that have already been fitted. ``` python fit.py --dir assets/ ```
4.2 Render SMPL meshes Refer to [TEMOS-Rendering motions](https://github.com/Mathux/TEMOS) for blender setup (only **Installation** section). We support three rendering modes for SMPL mesh, namely `sequence` (default), `video` and `frame`.
4.2.1 sequence ``` YOUR_BLENDER_PATH/blender --background --python render.py -- --pkl assets/example_mesh.pkl --mode sequence --num 8 ``` You will get a rendered image of `num=8` keyframes as shown in `assets/example_mesh.png`. The darker the color, the later the time. example
4.2.2 video ``` YOUR_BLENDER_PATH/blender --background --python render.py -- --pkl assets/example_mesh.pkl --mode video --fps 20 ``` You will get a rendered video with `fps=20` as shown in `assets/example_mesh.mp4`. ![example](assets/example_mesh_show.gif)
4.2.3 frame ``` YOUR_BLENDER_PATH/blender --background --python render.py -- --pkl assets/example_mesh.pkl --mode frame --exact_frame 0.5 ``` You will get a rendered image of the keyframe at `exact_frame=0.5` (i.e., the middle frame) as shown in `assets/example_mesh_0.5.png`. example

πŸš€ Train your own models

We provide the training guidance for both text-to-motion and motion control tasks. The following steps will guide you through the training process.

1. Important args in the config yaml The parameters required for model training and testing are recorded in the corresponding YAML file (e.g., `configs/motionlcm_t2m.yaml`). Below are some of the important parameters in the file: - `${FOLDER}`: The folder for the specific training task (i.e., `experiments_t2m` and `experiments_control`). - `${TEST_FOLDER}`: The folder for the specific testing task (i.e., `experiments_t2m_test` and `experiments_control_test`). - `${NAME}`: The name of the model (e.g., `motionlcm_humanml`). `${FOLDER}`, `${NAME}`, and the current timestamp constitute the training output folder (for example, `experiments_t2m/motionlcm_humanml/2024-04-06T23-05-07`). The same applies to `${TEST_FOLDER}` for testing. - `${TRAIN.PRETRAINED}`: The path of the pretrained model. - `${TEST.CHECKPOINTS}`: The path of the testing model.
2. Train MotionLCM and motion ControlNet #### 2.1. Ready to train motion VAE and MLD (optional) Please update the parameters in `configs/vae.yaml` and `configs/mld_t2m.yaml`. Then, run the following commands: ``` python -m train_vae --cfg configs/vae.yaml python -m train_mld --cfg configs/mld_t2m.yaml ``` This step is **OPTIONAL** since we provide pre-trained models for training MotionLCM and motion ControlNet. #### 2.2. Ready to train MotionLCM Please first check the parameters in `configs/motionlcm_t2m.yaml`. Then, run the following command (**13GB usage**): ``` python -m train_motionlcm --cfg configs/motionlcm_t2m.yaml ``` #### 2.3. Ready to train motion ControlNet Please update the parameters in `configs/motionlcm_control.yaml`. Then, run the following command (**16GB usage**): ``` python -m train_motion_control --cfg configs/motionlcm_control.yaml ```
3. Evaluate the model Text-to-Motion: ``` python -m test --cfg configs/motionlcm_t2m.yaml ``` Motion Control: ``` python -m test --cfg configs/motionlcm_control.yaml ```

πŸ“Š Results

We provide the quantitative and qualitative results in the paper.

Text-to-Motion (quantitative) example
Text-to-Motion (qualitative) example
Motion Control (quantitative) example

🌹 Acknowledgement

We would like to thank the authors of the following repositories for their excellent work: MLD, latent-consistency-model, OmniControl, TEMOS, HumanML3D, UniMoCap, joints2smpl.

πŸ“œ Citation

If you find this work useful, please consider citing our paper:

@article{motionlcm,
  title={MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model},
  author={Dai, Wenxun and Chen, Ling-Hao and Wang, Jingbo and Liu, Jinpeng and Dai, Bo and Tang, Yansong},
  journal={arXiv preprint arXiv:2404.19759},
  year={2024},
}

πŸ“š License

This code is distributed under an MotionLCM LICENSE, which not allowed for commercial usage. Note that our code depends on other libraries and datasets which each have their own respective licenses that must also be followed.

🌟 Star History

Star History Chart

If you have any question, please contact at Wenxun Dai and cc to Ling-Hao Chen and Jingbo Wang.