The source codes for ICCV2021 Paper Spatio-Temporal Dynamic Inference Network for Group Activity Recognition.
Note that we also incorporate the core implementation of our AAAI 2021 paper Learning Visual Context for Group Activity Recognition in this repo.
If you find our work or the codebase inspiring and useful to your research, please consider ⭐starring⭐ the repo and citing:
@inproceedings{yuan2021DIN,
title={Spatio-Temporal Dynamic Inference Network for Group Activity Recognition},
author={Yuan, Hangjie and Ni, Dong and Wang, Mang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={7476--7485},
year={2021}
}
@inproceedings{yuan2021visualcontext,
title={Learning Visual Context for Group Activity Recognition},
author={Yuan, Hangjie and Ni, Dong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={4},
pages={3261--3269},
year={2021}
}
3.6
1.2.0
, Torchvision 0.4.0
data/volleyball
or data/collective
.tracks_normalized.pkl
from cvlab-epfl/social-scene-understanding and put it into data/volleyball/videos
Checkout repository and cd PROJECT_PATH
Build the Docker container
docker build -t din_gar https://github.com/JacobYuan7/DIN_GAR.git#main
Run the Docker container
docker run --shm-size=2G -v data/volleyball:/opt/DIN_GAR/data/volleyball -v result:/opt/DIN_GAR/result --rm -it din_gar
--shm-size=2G
: To prevent ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)., you have to extend the container's shared memory size. Alternatively: --ipc=host
-v data/volleyball:/opt/DIN_GAR/data/volleyball
: Makes the host's folder data/volleyball
available inside the container at /opt/DIN_GAR/data/volleyball
-v result:/opt/DIN_GAR/result
: Makes the host's folder result
available inside the container at /opt/DIN_GAR/result
-it
& --rm
: Starts the container with an interactive session (PROJECT_PATH is /opt/DIN_GAR
) and removes the container after closing the session.din_gar
the name/tag of the image--gpus='"device=7"'
restrict the GPU devices the container can access.Train the Base Model: Fine-tune the base model for the dataset.
# Volleyball dataset
cd PROJECT_PATH
python scripts/train_volleyball_stage1.py
# Collective Activity dataset
cd PROJECT_PATH
python scripts/train_collective_stage1.py
Train with the reasoning module: Append the reasoning modules onto the base model to get a reasoning model.
Volleyball dataset
python scripts/train_volleyball_stage2_dynamic.py
python scripts/train_volleyball_stage2_dynamic.py
ST-factorized DIN \ We can run ST-factorized DIN by setting cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.hierarchical_inference = True.
Note that if you set cfg.hierarchical_inference = False, cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.num_DIN = 2, then multiple interaction fields run in parallel.
python scripts/train_volleyball_stage2_dynamic.py
Other model re-implemented by us according to their papers or publicly available codes:
python scripts/train_volleyball_stage2_at.py
python scripts/train_volleyball_stage2_pctdm.py
python scripts/train_volleyball_stage2_sacrf_biute.py
python scripts/train_volleyball_stage2_arg.py
python scripts/train_volleyball_stage2_higcin.py
Collective Activity dataset
python scripts/train_collective_stage2_dynamic.py
python scripts/train_collective_stage2_dynamic.py
Another work done by us, solving GAR from the perspective of incorporating visual context, is also available.
@inproceedings{yuan2021visualcontext,
title={Learning Visual Context for Group Activity Recognition},
author={Yuan, Hangjie and Ni, Dong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={4},
pages={3261--3269},
year={2021}
}