HuiGuanLab / UmURL

This is a repository contains the implementation of our ACM MM 2023 paper Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding.
Apache License 2.0
10 stars 2 forks source link

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding

This is a repository containing the implementation of our ACM MM 2023 paper UmURL.

Paper Link: arXiv, ACM DL

Requirements

. .
Use the following instructions to create the corresponding conda environment.

conda create -n umurl python=3.9 anaconda
conda activate umurl
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 -c pytorch
pip3 install tensorboard

Data Preparation

Downstream Task Evaluation

$dataset is the dataset to use for unsupervised pre-training (ntu60 or ntu120).

$protocol means training protocol (cross_subject/cross_view for ntu60, and cross_subject/cross_setup for ntu120).

It will automatically evaluate the checkpoint from the final epoch obtained from the aforementioned pre-training. Please refer to the bash scripts for other pre-training configurations.

$dataset is the dataset to use for unsupervised pre-training (ntu60 or ntu120).

$protocol means training protocol (cross_subject/cross_view for ntu60, and cross_subject/cross_setup for ntu120).

Pretrained Models

We release several pre-trained models: Google Drive

Expected performance on skeleton-based action recognition and skeleton-based action retrieval:

Task NTU 60 x-sub (%) NTU 60 x-view (%) NTU 120 x-sub (%) NTU 120 x-setup (%)
action recognition 84.2 90.9 75.2 76.3
action retrieval 72.0 88.9 59.5 62.2

Visualization

t-SNE visualization of the learned multi-modal action representations obtained by (a) simple baseline and (b) our proposed UmURL on NTU-60. 10 classes from the x-view testing set are randomly selected for visualization. Dots with the same color indicate actions belonging to the same class.

Citation

If you find this repository useful, please consider citing our paper:

@inproceedings{sun2023unified,
  title={Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding},
  author={Sun, Shengkai and Liu, Daizong and Dong, Jianfeng and Qu, Xiaoye and Gao, Junyu and Yang, Xun and Wang, Xun and Wang, Meng},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  pages={2973--2984},
  year={2023}
}

Acknowledgements

This work was supported by the "Pioneer" and "Leading Goose" R&D Program of Zhejiang (No.2023C01212), Public Welfare Technology Research Project of Zhejiang Province (No. LGF21F020010), National Natural Science Foundation of China (No. 61976188, 62272435, and U22A2094), Young Elite Scientists Sponsorship Program by CAST (No. 2022QNRC001), the open research fund of The State Key Laboratory of Multimodal Artificial Intelligence Systems, and the Fundamental Research Funds for the Provincial Universities of Zhejiang.