Can the model used for action recognition for 2D skeleton data?

Walter0807 / MotionBERT

[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"

Apache License 2.0

1.02k stars 123 forks source link

Can the model used for action recognition for 2D skeleton data? #73

Closed leviethung2103 closed 1 year ago

leviethung2103 commented 1 year ago

Hi author,

I was impressed with your work. I would like to ask you a question about "Can the model used for action recognition for 2D skeleton data?"

Suppose that I have a lot of short trimmed video clips around 1s - 2s. I've used the HRNet to extract the 2D keypoints. Can I utilize your module to recognize the action just only based on the 2D keypoints ?

Thank you

Walter0807 commented 1 year ago

Yes, and you will need annotations/labels to finetune on your own dataset.

leviethung2103 commented 1 year ago

Hello.

Thank you for your quickly reply.

Please correct me if I am wrong. Here are the steps to train the model

For RGB videos, you need to extract 2D poses (inference.md), convert the keypoint format.
The model uses 17 body keypoints according the H36M format. Because I am use the COCO format, so I need to convert to H36M format. (H36M format).
Feed the training data into the MotionBERT.
Inference. (*but currently, I dont see the inference tutorial on this page )

By the way, how fast the MotionBERT inference on the RGB videos ? Can it achieve the realtime processing (30FPS).

Sorry for asking too much.

Walter0807 commented 1 year ago

From my understanding, you want to fine tune MB on your action dataset for skeleton-based action recognition. Then you need to prepare your action dataset like dataset_action.py and train like train_action.py

As for the speed, it generally depends on your 2D pose estimator and the MB part is quite fast.

leviethung2103 commented 1 year ago

Thank you for reply. I will check it later.