google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.76k stars 5.08k forks source link

Relative position for mediapipe Pose model #2558

Closed SutirthaChakraborty closed 1 year ago

SutirthaChakraborty commented 2 years ago

Please make sure that this is a feature request.

System information (Please provide as much relevant information as possible)

Describe the feature and the current behavior/state: A argument, which will enable to create a 3D plane and generate the 3D relative keypoints on that 3D plane, irrespective of the person changing its position like moving left to right or moving close or away from the camera.

Will this change the current api? How? It will be an extra added feature.

Who will benefit with this feature? For further training and processing the 3D or the 2D keypoint information, it will help a lot for developers who only want to work with the person's poses.

Please specify the use cases for this feature: A person playing a drum in the air while moving around.

Any Other info:

sgowroji commented 2 years ago

Hi @SutirthaChakraborty, Thank you for reaching us regarding the above issue. Could you please share us more details relevant to your above query. And did you get a chance to check our pose landmarks in world coordinates and its annotations to help you understand its implementation.

SutirthaChakraborty commented 2 years ago

For example, if I run pose estimation on these 4 images, they will generate the scaled down points(x,y,z) based on the person located in the image. image

Can a feature be added, where it will give us the referential distance irrespective of the person's position from the camera and distance from the camera so that we can use those points to create a virtual space around the person ?

Ideally, in this case, the result will be the same for all the images as they have the same pose. Thanks, Suti

SutirthaChakraborty commented 2 years ago

Hi, is it feasible?

sgowroji commented 2 years ago

May need to apply some optimizations for calculating the relative distance as mentioned here https://github.com/google/mediapipe/issues/1611

google-ml-butler[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 2 years ago

Closing as stale. Please reopen if you'd like to work on this further.

lucasjinreal commented 1 year ago

@sgowroji Does it avaibable now in 2023??

SutirthaChakraborty commented 1 year ago

@jinfagang use pose_world_landmarks pose_world_landmarks

ayushgdev commented 1 year ago

@jinfagang We do offer the pose_world_landmarks attribute in the Pose solution. This will give 3D landmarks output with origin in hips center. It provides z-keypoint which is an "estimate" from the GHUM model using 2D projecctions. The z-keypoint represents the distance "relative" to the plane of the subject's hips, which is the origin of the Z axis. So you can imagine a vertical place at the hip joint of the subject and any value <0 represents distance between camera and person. So you can filter out negative valued z-keypoints to know if a key-point is between camera and hip or beyond. However, as the distance is relative, it cannot be used to reliably estimate metric distance of real world. For details, please refer the model card here

lucasjinreal commented 1 year ago

@SutirthaChakraborty I don't know if you understand this question or not, but op was asking about body translation......

3d world landmark are using hips as original fixed points!

@ayushgdev I got 3d landmark, but this doesn't gives me body translation in space.....

I don't know what's z keypoint? I really don't understand, I want translation, not x y z, the x y z is all related to hips, it's movement all related to hips which means, it not real space translation.......

how can I get the translation ?

After read your explain 100 times, what I need is not only the distance between body and camera, but also x and y./.............

which means, I need an x y z translation, this can not calculated from currently 3d landmarks!!!

ayushgdev commented 1 year ago

@jinfagang Let us reassign to @hadon to provide a better perspective and solution to your query.

lucasjinreal commented 1 year ago

@andrechen Stop thumb down issue unrelated with you. Am asking my own questions, if you don't know just ignore, stop such down voted on your personal.

kuaashish commented 1 year ago

Hello @SutirthaChakraborty, We are upgrading the MediaPipe Legacy Solutions to new MediaPipe solutions However, the libraries, documentation, and source code for all the MediaPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.

You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions which can help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions.

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 year ago

This issue was closed due to lack of activity after being marked stale for past 7 days.

ralarcong commented 1 year ago

@SutirthaChakraborty I don't know if you understand this question or not, but op was asking about body translation......

3d world landmark are using hips as original fixed points!

@ayushgdev I got 3d landmark, but this doesn't gives me body translation in space.....

I don't know what's z keypoint? I really don't understand, I want translation, not x y z, the x y z is all related to hips, it's movement all related to hips which means, it not real space translation.......

how can I get the translation ?

After read your explain 100 times, what I need is not only the distance between body and camera, but also x and y./.............

which means, I need an x y z translation, this can not calculated from currently 3d landmarks!!!

Hello, I'm having a similar issue. When I draw the world coordinates, the body is centered in the middle, but I want it to have a movement like I move in front of the camera (x,y and z). Did you manage to find a solution?