Closed Xinxinatg closed 1 year ago
Hi, thanks for your interest in our work! We did not release that because the action categories are limited by the definition of NTU-RGB+D, so it would not be generally applicable. Nonetheless, it should be easy to implement given the WildDetDataset
class. You can combine the data part in infer_wild.py and the inference part in train_action.py for that.
Thanks for your prompt reply! I got another question, the representation is obtained in this way
'''
x: 2D skeletons
type = <class 'torch.Tensor'>
shape = [batch size frames joints(17) * channels(3)]
MotionBERT: pretrained human motion encoder type = <class 'lib.model.DSTformer.DSTformer'>
E: encoded motion representation type = <class 'torch.Tensor'> shape = [batch size frames joints(17) * channels(512)] ''' As I see from this code, only 2d coordindates information are used to extract the representation. But from what I understand, the action recognition is based on the reconstructed 3d skeleton which is obtained from 2d skeleton and the video. I am confused why the video information is not needed to extract the representation.
The motion representation is based on the 2D skeletons, no need to estimate the 3D skeletons explicitly. The video information is used when extracting the 2D skeletons.
Hey thanks for this wonderful work, the performance of 2D-3D recontruction is just eye-openning. I am just wondering whether the action recognition inference code for custom video is released yet, I can only find the evaluation code for action recognition which is meant for NTU-RGBD dataset.