I have quick questions for training

czhaneva / MST-GCN

This is the official implemntation for "Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition" AAAI-2021

MIT License

36 stars 7 forks source link

I have quick questions for training #2

Closed Wooho-Moon closed 2 years ago

Wooho-Moon commented 2 years ago

Thanks for your awesome works. I just would like to train model with Kinetics-Skeleton data. I download Kinetics-Skeleton, preprocess that data and generate the bone and the joint data with tools. then, I am capable of obtaining train_data_joint.npy , train_data_Joint_motion.npy, train_data_bone.npy and train_data_bone_motion.npy When i am about to run run.sh script for training, i should change the file named train.yaml.

but there is a component about data_path, i don't know what i choose the file among rain_data_joint.npy , train_data_Joint_motion.npy, train_data_bone.npy and train_data_bone_motion.npy.
What's the difference about them?
there are some files in data_gen folder: merge_joint_bone_data.py and merge_joint_motion_data.py. i would like to know the purpose and the usage of that files.

could you give me some advises?

czhaneva commented 2 years ago

For single stream, you can choose the train_data_joint.npy, it performs best.
You can know their exact meaning from Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks
It merges two different data (joint and bone or motion) along the channel dimension, so the input_channel should change to 6, if you double the channels of the network, it should perform better than the score level fusion. If you have any other questions, feel free to put them here.

Wooho-Moon commented 2 years ago

Thanks for your reply :) I read the paper in your answer. I have another question.

if i would like to recognize multiple people's action, how should i do? input data is merged by the mean function along the M dimension before passing through the FC layer. M means number of the person which has high energy. but it is merged...

Could u give me some advises?

czhaneva commented 2 years ago

Sorry, so far, skeleton behavior recognition only supports single-person action recognition or multi-person interaction action recognition. For your question (multiple person action recognition), I have two suggestions.

Detect and track the persons, then use pose estimation (e.g. openpose) to extract the skeleton sequences for each person, than recognize the actions with skeleton-based action recognition.
Maybe you should read the paper in Spatial-Temporal Action Detection, for example, MOC Hope this helps you~

Wooho-Moon commented 2 years ago

i appreciate for your reply. I already read the paper about MOC! anyway, really thanks again for your awesome work and answers! I try to do your recommendation in the frist manner.