Closed UtsaChattopadhyay closed 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.
Closing as stale. Please reopen if you'd like to work on this further.
Closing as stale. Please reopen if you'd like to work on this further.
Hi @UtsaChattopadhyay , did you figure this out?
Yes we did
On Fri, Sep 23, 2022, 03:39 JesperStenberg @.***> wrote:
Hi @UtsaChattopadhyay https://github.com/UtsaChattopadhyay , did you figure this out?
— Reply to this email directly, view it on GitHub https://github.com/google/mediapipe/issues/3532#issuecomment-1255468837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKS5W3JWFSY4WK5SJYNT643V7SYW7ANCNFSM54CDCVXQ . You are receiving this because you were mentioned.Message ID: @.***>
Yes we did … On Fri, Sep 23, 2022, 03:39 JesperStenberg @.> wrote: Hi @UtsaChattopadhyay https://github.com/UtsaChattopadhyay , did you figure this out? — Reply to this email directly, view it on GitHub <#3532 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKS5W3JWFSY4WK5SJYNT643V7SYW7ANCNFSM54CDCVXQ . You are receiving this because you were mentioned.Message ID: @.>
Do you mind sharing it regarding the pose_landmark model? We get a [1, 195] output from "Identity". I've come to understand that this represent 39 arrays of [x, y, z, visibility, presence] (33 points on the body + 6 extra points for next frame tracking).
This works very well when the input scene is easy, x and y tracks perfectly to the image. But if the person is partially of screen the tracking fails completely, which it doesn't do in the Mediapipe examples.
Do you have any insights?
@JesperStenberg or @UtsaChattopadhyay , did one of you figure it out for the pose_landmark model? Where did you find the information about 33 + 6 landmarks? Are those 6 at the end or the beginning of the array?
It was a while ago and I don't have it available, but i'm pretty sure that those 6 are at the end of the array. The thing that threw me off was that the person needs to be centred in the image for the model to work.
If you haven't checked this link it might have some good info.
Thanks @JesperStenberg, that was an important hint. Unfortunately I cant find any doc which explains the output in more detail which makes implementing harder and less clean.
If we have a high enough frame rate could we find the acceleration and velocity of not only the center but the limbs also? Random 3 am idea hoping to get some feedback and minimum frame rate for usable results
Hello @UtsaChattopadhyay, We are upgrading the MediaPipe Legacy Solutions to new MediaPipe solutions However, the libraries, documentation, and source code for all the MediapPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.
You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions which can help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions. Thank you
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
This issue was closed due to lack of activity after being marked stale for past 7 days.
I am working on building a standalone cpp project for detecting pose. In the project,I am using 2 of the mediapipe tflite models (pose_detection and pose_landmark), and the output dimensions of the models are attached below. For the pose detection, we have two outputs (2254, 12) and (2254, 1). What do these values correspond to, and how do we do the postprocessing on these values? On the mediapipe webpage, it says that the output of Pose detector is similar to Face detector + (human body center, radius, and rotation). Similarly, for Pose Landmarks, we have 5 outputs - (1,195), (1,1), (1,256,256,1), (1,64,54,39), and (1,117). As we have understood that (1,1) is a classifier, and (1,256,256,1) is a segmentation mask. However, the other 3 output values are not clear. Here, it says that the pose landmark model detects 33 landmarks in pixel and world space, where each landmark has 4 values (x,y,z, visibility). I am assuming that the shape should correspond to a total of 4 (x,y,z,visiblity) 33 (landmarks) 2 (pixel and world space). Can you please let me know how to make sense of these two model outputs, and also the post-processing related to them?
Pose Detection Output
Pose Landmark Output