If my dataset stores frame-by-frame images in each folder, with images containing only upper body movements, and each folder represents an action sequence with a corresponding text description.
How should I use your code for such a dataset?
Also, which part of your code is related to text-motion alignment? I couldn't find it, please point it out. Thank you.
If I have only extracted the 2D skeletal keypoints of human images, how do I use your code?
If my dataset stores frame-by-frame images in each folder, with images containing only upper body movements, and each folder represents an action sequence with a corresponding text description. How should I use your code for such a dataset? Also, which part of your code is related to text-motion alignment? I couldn't find it, please point it out. Thank you. If I have only extracted the 2D skeletal keypoints of human images, how do I use your code?