felixchenfy / Realtime-Action-Recognition

Apply ML to the skeletons from OpenPose; 9 actions; multiple people. (WARNING: I'm sorry that this is only good for course demo, not for real world applications !!! Those ary very difficult !!!)
MIT License
857 stars 248 forks source link
action-recognition machine-learning openpose time-series

Multi-person Real-time Action Recognition Based-on Human Skeleton

Highlights: 9 actions; multiple people (<=5); Real-time and multi-frame based recognition algorithm.

Updates: On 2019-10-26, I refactored the code; added more comments; and put all settings into the config/config.yaml file, including: classes of actions, input and output of each file, OpenPose settings, etc.

Project: This is my final project for EECS-433 Pattern Recognition in Northwestern Univeristy on March 2019. A simpler version where two teammates and I worked on is here.

Warning: Since I used the 10 fps video and 0.5s-window for training, you must also limit your video fps to be about 10 fps (7~12 fps) if you want to test my pretrained model on your own video or web camera.

Contents:

1. Algorithm

I collected videos of 9 Types of actions: ['stand', 'walk', 'run', 'jump', 'sit', 'squat', 'kick', 'punch', 'wave']. The total video lengths are about 20 mins, containing about 10000 video frames recorded at 10 frames per second.

The workflow of the algorithm is:

For more details about how the features are extracted, please see my report.

2. Install Dependency (OpenPose)

We need Python >= 3.6.

2.1. Download tf-pose-estimation

This project uses a OpenPose program developped by ildoonet. The source project has been deleted. I've managed to fork it to here: tf-pose-estimation.

Please download it:

export MyRoot=$PWD
cd src/githubs  
git clone https://github.com/felixchenfy/ildoonet-tf-pose-estimation
mv ildoonet-tf-pose-estimation tf-pose-estimation

2.2. Download pretrained models

The mobilenet_thin models are already included in the project. No need to download. See folder:

src/githubs/tf-pose-estimation/models/graph✗ ls
cmu  mobilenet_thin  mobilenet_v2_large  mobilenet_v2_small

If you want to use the original OpenPose model which is named "cmu" here, you need to download it:

cd $MyRoot/src/githubs/tf-pose-estimation/models/graph/cmu  
bash download.sh  

2.3. Insteall libraries

Basically you have to follow the tutorial of tf-pose-estimation project. If you've setup the env for that project, then it's almost the same env to run my project.

Please follow its tutorial here. I've copied what I ran to below:

conda create -n tf tensorflow-gpu
conda activate tf

cd $MyRoot/src/githubs/tf-pose-estimation
pip3 install -r requirements.txt
pip3 install jupyter tqdm

# Install tensorflow.
# You may need to take a few tries and select the version that is compatible with your cuDNN. If the version mismatches, you might get this error: "Error : Failed to get convolution algorithm."
pip3 install tensorflow-gpu==1.13.1

# Compile c++ library as described [here](https://github.com/felixchenfy/ildoonet-tf-pose-estimation#install-1):
sudo apt install swig
pip3 install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
cd $MyRoot/src/githubs/tf-pose-estimation/tf_pose/pafprocess
swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace

Then install some small libraries used by me:

cd $MyRoot
pip3 install -r requirements.txt

2.4. Verify installation

Make sure you can successfully run its demo examples:

cd $MyRoot/src/githubs/tf-pose-estimation
python run.py --model=mobilenet_thin --resize=432x368 --image=./images/p1.jpg

If you encounter error, you may try to search in google or tf-pose-estimation's issue. The problem is probably due to the dependency of that project.

3. Program structure

Diagram

Trouble shooting:

Main scripts

The 5 main scripts are under src/. They are named under the order of excecution:

src/s1_get_skeletons_from_training_imgs.py    
src/s2_put_skeleton_txts_to_a_single_txt.py
src/s3_preprocess_features.py
src/s4_train.py 
src/s5_test.py

The input and output of these files as well as some parameters are defined in the configuration file config/config.yaml. I paste part of it below just to provide an intuition:

classes: ['stand', 'walk', 'run', 'jump', 'sit', 'squat', 'kick', 'punch', 'wave']

image_filename_format: "{:05d}.jpg"
skeleton_filename_format: "{:05d}.txt"

features:
  window_size: 5 # Number of adjacent frames for extracting features. 

s1_get_skeletons_from_training_imgs.py:
  openpose:
    model: cmu # cmu or mobilenet_thin. "cmu" is more accurate but slower.
    img_size: 656x368 #  656x368, or 432x368, 336x288. Bigger is more accurate.
  input:
    images_description_txt: data/source_images3/valid_images.txt
    images_folder: data/source_images3/
  output:
    images_info_txt: data_proc/raw_skeletons/images_info.txt # This file is not used.
    detected_skeletons_folder: &skels_folder data_proc/raw_skeletons/skeleton_res/
    viz_imgs_folders: data_proc/raw_skeletons/image_viz/

s2_put_skeleton_txts_to_a_single_txt.py:
  input:
    # A folder of skeleton txts. Each txt corresponds to one image.
    detected_skeletons_folder: *skels_folder
  output:
    # One txt containing all valid skeletons.
    all_skeletons_txt: &skels_txt data_proc/raw_skeletons/skeletons_info.txt

s3_preprocess_features.py:
  input: 
    all_skeletons_txt: *skels_txt
  output:
    processed_features: &features_x data_proc/features_X.csv
    processed_features_labels: &features_y data_proc/features_Y.csv

s4_train.py:
  input:
    processed_features: *features_x
    processed_features_labels: *features_y
  output:
    model_path: model/trained_classifier.pickle

For how to run the main scripts, please see the Section 4. How to run: Inference and 6. How to run: Training.

4. How to run: Inference

Introduction

The script src/s5_test.py is for doing real-time action recognition.

The classes are set in config/config.yaml by the key classes.

The supported input includes video file, a folder of images, and web camera, which is set by the command line arguments --data_type and --data_path.

The trained model is set by --model_path, e.g.:model/trained_classifier.pickle.

The output is set by --output_folder, e.g.: output/.

The test data (a video, and a folder of images) are already included under the data_test/ folder.

An example result of the input video "exercise.avi" is:

output/exercise/
├── skeletons
│   ├── 00000.txt
│   ├── 00001.txt
│   └── ...
└── video.avi

Also, the result will be displayed by cv2.imshow().

Example commands are given below:

Test on video file

python src/s5_test.py \
    --model_path model/trained_classifier.pickle \
    --data_type video \
    --data_path data_test/exercise.avi \
    --output_folder output

Test on a folder of images

python src/s5_test.py \
    --model_path model/trained_classifier.pickle \
    --data_type folder \
    --data_path data_test/apple/ \
    --output_folder output

Test on web camera

python src/s5_test.py \
    --model_path model/trained_classifier.pickle \
    --data_type webcam \
    --data_path 0 \
    --output_folder output

5. Training data

Download my data

Follow the instructions in data/download_link.md to download the data. Or, you can create your own. The data and labelling format are described below.

Data format

Each data subfolder (e.g. data/source_images3/jump_03-02-12-34-01-795/) contains images named as 00001.jpg, 00002.jpg, etc.
The naming format of each image is defined in config/config.yaml by the sentence: image_filename_format: "{:05d}.jpg".

The images to be used as training data and their label are configured by this txt file: data/source_images3/valid_images.txt.
A snapshot of this txt file is shown below:

jump_03-02-12-34-01-795
52 59
72 79

kick_03-02-12-36-05-185
54 62

In each paragraph,
the 1st line is the data folder name, which should start with "${class_name}_". The 2nd and following lines specify the staring index and ending index of the video that corresponds to that class.

Let's take the 1st paragraph of the above snapshot as an example: jump is the class, and the frames 52~59 & 72~79 of the video are used for training.

Classes

The classes are set in config/config.yaml under the key word classes. No matter how many classes you put in the training data (set by the folder name), only the ones that match with the classes in config.yaml are used for training and inference.

6. How to run: Training

First, you may read

to know the training data format and the input and output of each script.

Then, follow the following steps to do the training:

By default, the intermediate data are saved to data_proc/, and the model is saved to model/trained_classifier.pickle.
After training is done, you can run the inference script src/s5_test.py as described in Section 4. How to run: Inference.

7. Result and Performance

Unfortunately this project only works well on myself, because I only used the video of myself.

The performance is bad for (1) people who have different body shape, (2) people are far from the camera. How to improve? I guess the first thing to do is to collect larger training set from different people. Then, improve the data augmentation and featuer selection.

Besides, my simple tracking algorithm only works for a few number of people (maybe 5).

Due to the not-so-good performance of action recognition, I guess you can only use this project for course demo, but not for any commercial applications ... T.T