The official PyTorch implementation of the paper "MMM: Generative Masked Motion Model".
Please visit our webpage for more details.
If our project is helpful for your research, please consider citing :
@inproceedings{pinyoanuntapong2024mmm,
title={MMM: Generative Masked Motion Model},
author={Ekkasit Pinyoanuntapong and Pu Wang and Minwoo Lee and Chen Chen},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024},
}
📢 July/2/24 - Our new paper (BAMM) is accepted at ECCV 2024
📢 June/10/24 - Update pretrain model with FID. 0.070 (using batchsize 128)
📢 June/8/24 - Interactive demo is live at huggingface
📢 June/3/24 - Fix generation bugs & add download script & update pretrain model with 2 local layers (better score than reported in the paper)
conda env create -f environment.yml
conda activate MMM
If you have a problem with the conflict, you can install them manually
conda create --name MMM
conda activate MMM
conda install plotly tensorboard scipy matplotlib pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
pip install git+https://github.com/openai/CLIP.git einops gdown
pip install --upgrade nbformat
bash dataset/prepare/download_glove.sh
We use the same extractors provided by t2m to evaluate our generated motions. Please download the extractors.
bash dataset/prepare/download_extractor.sh
bash dataset/prepare/download_model.sh
bash dataset/prepare/download_model_upperbody.sh
We are using two 3D human motion-language dataset: HumanML3D and KIT-ML. For both datasets, you could find the details as well as download link [here].
Take HumanML3D for an example, the file directory should look like this:
./dataset/HumanML3D/
├── new_joint_vecs/
├── texts/
├── Mean.npy # same as in [HumanML3D](https://github.com/EricGuo5513/HumanML3D)
├── Std.npy # same as in [HumanML3D](https://github.com/EricGuo5513/HumanML3D)
├── train.txt
├── val.txt
├── test.txt
├── train_val.txt
└── all.txt
python train_vq.py --dataname t2m --exp-name vq_name
python train_t2m_trans.py --vq-name vq_name --out-dir output/t2m --exp-name trans_name --num-local-layer 2
output/vq/vq_name/net_last.pth
--num-local-layer
is number of cross attention layer export CUDA_VISIBLE_DEVICES=0,1,2,3
--batch-size 512 --total-iter 75000
output/vq/vq_name/codebook
(It will take a couple minutes.)python GPT_eval_multi.py --exp-name eval_name --resume-pth output/vq/2024-06-03-20-22-07_retrain/net_last.pth --resume-trans output/t2m/2024-06-04-09-29-20_trans_name_b128/net_last.pth --num-local-layer 2
The log and tensorboard data will be in ./output/eval/
--resume-pth
path for vevae--resume-trans
path for transformer
python generate.py --resume-pth output/vq/2024-06-03-20-22-07_retrain/net_last.pth --resume-trans output/t2m/2024-06-04-09-29-20_trans_name_b128/net_last.pth --text 'the person crouches and walks forward.' --length 156
The generated html is in output
folder.
(Please load pretrained model for upper body editing as described in section 2.4)
For better visualization of motion editing, we provide examples of all editing tasks in Jupyter Notebook. Please see the details in:
./edit.ipynb
This code is distributed under an LICENSE-CC-BY-NC-ND-4.0.
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, T2M-GPT, and uses datasets that each have their own respective licenses that must also be followed.