Hope1337 / YOWOv3

28 stars 4 forks source link

YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition

This is an implementation of paper : YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition.


Preface

Hello, thank you everyone for your attention to this study. If you find it valuable, please consider leaving a star, as it would greatly encourage me.

If you intend to use this repository for your own research, please consider to cite:

@misc{dang2024yowov3efficientgeneralizedframework,
      title={YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition}, 
      author={Duc Manh Nguyen Dang and Viet Hang Duong and Jia Ching Wang and Nhan Bui Duc},
      year={2024},
      eprint={2408.02623},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.02623}, 
}

About asking question

I am very pleased that everyone has shown interest in this project. There are many questions being raised, and I am more than willing to answer them as soon as possible. However, if you have any doubts about the code or related matters, please provide me with context (config file, some samples that you couldn't detect, a checkpoint, etc.). Also, please use English.


Structure of Instruction

In this Instruction, I will divide it into smaller sections, with each section serving a specific purpose. I will provide a summary of this Instruction structure in order right below. Please read carefully to locate the information you are looking for.


Preparation

Environment setup

Clone this repository

git clone https://github.com/AakiraOtok/YOWOv3.git

Use Python 3.8 or Python 3.9, and then download the dependencies:

pip install -r requirements.txt

Note: On my system, I use Python 3.7 with slightly different dependencies, specifically for torch:

torch==1.13.1+cu117
torchaudio==0.13.1+cu117
torchvision==0.14.1+cu117

However, when testing on another system, it seems that these versions have been deprecated. I have updated the requirements.txt file and tested it again on systems using Python 3.8 and Python 3.9, and everything seems to be working fine. If you encounter any errors during the environment setup, please try asking in the "issues" section. Perhaps someone has faced a similar issue and has already found a solution.

Datasets

UCF101-24

AVAv2.2


Basic Usage

About config file

The project is designed in such a way that almost every configuration can be adjusted through the config file. In the repository, I have provided two sample config files: ucf_config.yaml and ava_config.yaml for the UCF101-24 and AVAv2.2 datasets, respectively. The Basic Usage section will not involve extensive modifications of the config file, while the customization of the config will be covered in the Customization section.

Warning!: Since all configurations are closely related to the config file, please carefully read the part Modify Config file in the Customization section to be able to use the config file correctly.

Simple command line

We have the following command template:

python main.py --mode [mode] --config [config_file_path]

Or the shorthand version:

python main.py -m [mode] -cf [config_file_path]

For [mode] = {train, eval, detect, live, onnx} for training, evaluation, detection (visualization on the current dataset), live (camera usage) or export to onnx and inference respectively. The[config_file_path] is the path to the config file.

Example of training a model on UCF101-24:

python main.py --mode train --config config/ucf_config.yaml

Or try evaluating a model on AVAv2.2:

python main.py -m eval -cf config/ava_config.yaml

Customization

Modify Config file

There are some notes about the config file:

import yaml

def build_config(config_file='config/ucf_config.yaml'):
    with open(config_file, "r") as file:
        config = yaml.load(file, Loader=yaml.SafeLoader)

    if config['active_checker']:
        pass

    return config

Custom Dataset

I know why you are here, my friend =)))))

You can build a custom dataset for yourself, however, make sure to carefully read the notes below to do it correctly.

Firstly, every time you want to use a dataset, simply call the build_dataset function as shown in the example code:

    dataset = build_dataset(config, phase='train')

The build_dataset function is defined in datasets/build_dataset.py as follows:

from datasets.ucf.load_data import build_ucf_dataset
from datasets.ava.load_data import build_ava_dataset

def build_dataset(config, phase):
    dataset = config['dataset']

    if dataset == 'ucf':
        return build_ucf_dataset(config, phase)
    elif dataset == 'ava':
        return build_ava_dataset(config, phase)

To accommodate your needs, you simply need to define the build_custom_dataset function for your specific purpose and modify the above build_dataset function accordingly.

The model is generalized to train on multi-action datasets, meaning that each box may have multiple actions simultaneously. However, metrics for one box - one action are more common than one box - multi-action. Therefore, I will guide you on evaluation as one box - one action, while training as one box - multi-action for generalization.

The build_dataset function returns custom-defined dataset classes. There are two important parameters to consider: config and phase. config is a dictionary containing options as in the config file (loaded beforehand), with nothing particularly special. On the other hand, phase has two values: train or test. train is used for training, and test is used for detection/evaluation/live stages.

Let:

You need to return:

Please note that index class start at $0$.

To evaluate, use ucf_eval.py.


Pretrained Resources

All pre-trained models for backbone2D, backbone3D and model checkpoints are publicly available on my Hugging Face repo.

Regarding the model checkpoints, I have consolidated them into an Excel file that looks like this:

Untitled

Each cell represents a model checkpoint, displaying information such as mAP, GLOPs, and # param in order. The checkpoints are stored as folders named after the corresponding cells in the Excel file (e.g., O27, N23, ...). Each folder contains the respective config file used for training that model. Please note that both the regular checkpoint and the exponential moving average (EMA) version of the model are saved.

Warning!: Since all configurations are closely related to the config file, please carefully read the part Modify Config file in the Customization section to be able to use the config file correctly.

Limitations and Future Development


Some notes:


References

I would like to express my sincere gratitude to the following amazing repositories/codes, which were the primary sources I heavily relied on and borrowed code from during the development of this project: