Closed leviethung2103 closed 1 year ago
Yes, and you will need annotations/labels to finetune on your own dataset.
Hello.
Thank you for your quickly reply.
Please correct me if I am wrong. Here are the steps to train the model
For RGB videos, you need to extract 2D poses (inference.md), convert the keypoint format.
The model uses 17 body keypoints according the H36M format. Because I am use the COCO format, so I need to convert to H36M format. (H36M format).
Feed the training data into the MotionBERT.
Inference. (*but currently, I dont see the inference tutorial on this page )
By the way, how fast the MotionBERT inference on the RGB videos ? Can it achieve the realtime processing (30FPS).
Sorry for asking too much.
From my understanding, you want to fine tune MB on your action dataset for skeleton-based action recognition. Then you need to prepare your action dataset like dataset_action.py and train like train_action.py
As for the speed, it generally depends on your 2D pose estimator and the MB part is quite fast.
Thank you for reply. I will check it later.
Hi author,
I was impressed with your work. I would like to ask you a question about "Can the model used for action recognition for 2D skeleton data?"
Suppose that I have a lot of short trimmed video clips around 1s - 2s. I've used the HRNet to extract the 2D keypoints. Can I utilize your module to recognize the action just only based on the 2D keypoints ?
Thank you