Boeun-Kim / GL-Transformer

This is the official implementation of Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning (ECCV 2022).
MIT License
20 stars 2 forks source link

Train on custom video dataset #4

Closed CSLR-research closed 2 months ago

CSLR-research commented 4 months ago

Thank you for your amazing work. I want to train the model on another video-based dataset, using off-shelf pose estimators like mediapipe or mmpose to extract pose keypoints. I wondering if I can use the extracted keypoints (xyz coordinates) directly or is some prepossessing necessary? I'm not sure what is the neighbor_1base and parallel_skeleton used for in the NTUMotionProcessor.

Thanks again!

Boeun-Kim commented 3 months ago

Thanks for the question. If you use a different skeleton format, you need to implement the code for preprocessing and data feeding.

neighbor_1base: Information about the skeleton tree. Each bone has a parent and a child, and neighbor_1base represents the relationship.

parallel_skeleton: This is used for preprocessing. We rotate each skeleton data to align in the same direction.