ZcyMonkey / AttT2M

Code of ICCV 2023 paper: "AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism"
https://arxiv.org/abs/2309.00796
Apache License 2.0
37 stars 3 forks source link

(ICCV 2023) AttT2M

Code of ICCV 2023 paper: "AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism"

[Paper] [Bilibili Video]

The pre-train model and train/eval method are Updated. Please see below for more details.

teaser

If our paper or code is helpful to you, please cite our paper:

@InProceedings{Zhong_2023_ICCV,
    author    = {Zhong, Chongyang and Hu, Lei and Zhang, Zihao and Xia, Shihong},
    title     = {AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {509-519}
}

1. Results

1.1 Visual Results

Text-driven motion generation

gif

Compare with SOTA

gif

Generation diversity

gif

Fine-grained generation

gif

1.2 Quantitative Results

img

For more results, please refer to our [Demo])

2. Installation

2.1. Environment

conda env create -f environment.yml
conda activate Att-T2M

The code was tested on Python 3.8 and PyTorch 1.8.1.

2.2. Datasets and others

We use two dataset: HumanML3D and KIT-ML. For both datasets, the details about them can be found [here].
Motion & text feature extractors are also provided by t2m to evaluate our generated motions

3. Quick Start

1.First step: Download the pre-train models from Google Drive

pretrain_models/
   ├── HumanML3D/
      ├── Trans/
         ├──net_best_fid.pth
         ├──run.log
      ├── VQVAE/
         ├──net_last.pth
   ├── KIT/
      ├── Trans/
         ├──net_last_290000.pth
         ├──run.log
      ├── VQVAE/
         ├──net_last.pth
  1. Second step:Download other models from Google Drive

3.Third step:run the visualize script:

python vis.py

4. Train

Preparation: you need to download the necessary material from Google Drive:material1, material2

4.1. VQ-VAE

The VAVAE trian parameters are almost the same as T2M GPT

VQ training ```bash python3 train_vq.py \ --batch-size 256 \ --lr 2e-4 \ --total-iter 300000 \ --lr-scheduler 200000 \ --nb-code 512 \ --down-t 2 \ --depth 3 \ --dilation-growth-rate 3 \ --out-dir output \ --dataname t2m \ --vq-act relu \ --quantizer ema_reset \ --loss-vel 0.5 \ --recons-loss l1_smooth \ --exp-name VQVAE ```

4.2. GPT

The results are saved in the folder output.

GPT training ```bash python3 train_t2m_trans.py \ --num_layers_cross 2 \ --exp-name GPT \ --batch-size 128 \ --num-layers 9 \ --embed-dim-gpt 1024 \ --nb-code 512 \ --n-head-gpt 16 \ --block-size 51 \ --ff-rate 4 \ --drop-out-rate 0.1 \ --resume-pth output/VQVAE/net_last.pth \ --vq-name VQVAE \ --out-dir output \ --total-iter 300000 \ --lr-scheduler 150000 \ --lr 0.0001 \ --dataname t2m \ --down-t 2 \ --depth 3 \ --quantizer ema_reset \ --eval-iter 10000 \ --pkeep 0.5 \ --dilation-growth-rate 3 \ --vq-act relu ```

5. Evaluation

GPT eval ```bash python3 GPT_eval_multi.py \ --exp-name TEST_GPT \ --batch-size 128 \ --num-layers 9 \ --num_layers_cross 2 \ --embed-dim-gpt 1024 \ --nb-code 512 \ --n-head-gpt 16 \ --block-size 51 \ --ff-rate 4 \ --drop-out-rate 0.1 \ --resume-pth output/VQVAE/net_last.pth \ --vq-name VQVAE \ --out-dir output \ --total-iter 300000 \ --lr-scheduler 150000 \ --lr 0.0001 \ --dataname t2m \ --down-t 2 \ --depth 3 \ --quantizer ema_reset \ --eval-iter 10000 \ --pkeep 0.5 \ --dilation-growth-rate 3 \ --vq-act relu \ --resume-trans output/GPT/net_best_fid.pth ``` Please repalce "--resume-pth" and "--resume-trans" with the VQVAE and Transformer models you want to evaluate. The evaluation for multimodality will take a long time. So for a quicker evaluation without multimodality, you can comment out line 452 and line 453 in ./utils/eval_trans.py

6. Acknowledgement