google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.77k stars 5.09k forks source link

How to convert the MediaPipe FaceLandmark result to 3D space coordinates #5560

Open sachinksachu opened 1 month ago

sachinksachu commented 1 month ago

Hi,

I would like to use mediapipe facelandmark result and move 3D object using the result. The result of mediapipe facelandmark are normalized values and 3D model make use of position in world coordinates. How can I map between those values. Is this mapping possible.

kuaashish commented 1 month ago

Hi @sachinksachu,

There is no need to convert the face normalized values to 3D space. Our FaceLandmarker results directly provide 3D coordinates, named face_blendshapes, which can be used to map your model in your rendering engine. You can refer to the mp.tasks.vision.FaceLandmarkerResult API https://ai.google.dev/edge/api/mediapipe/js/tasks-vision.facelandmarkerresult for more details. If there are any confusion in my understanding, please let me know.

Thank you!!

sachinksachu commented 1 month ago

Hi @sachinksachu,

There is no need to convert the face normalized values to 3D space. Our FaceLandmarker results directly provide 3D coordinates, named face_blendshapes, which can be used to map your model in your rendering engine. You can refer to the mp.tasks.vision.FaceLandmarkerResult API https://ai.google.dev/edge/api/mediapipe/js/tasks-vision.facelandmarkerresult for more details. If there are any confusion in my understanding, please let me know.

Thank you!!

sachinksachu commented 1 month ago

Hi @kuaashish Does the face_blendshapes provides the Category like 'eyeBlinkLeft' , 'eyeBlinkRight' and its score? I have used this result before. Correct me if I am mistaken. Thanks & Regards

kuaashish commented 1 month ago

Hi @sachinksachu,

The model returns categories such as eyeBlinkLeft and eyeBlinkRight. You can review the implementation here: Mediapipe Face Blendshapes Graph. However, some features are not yet production-ready. Tracking the ongoing issues here: Mediapipe Issue #4210.

Thank you!!

sachinksachu commented 1 month ago

Hi @kuaashish My purpose is to show a 3d model based on the position (x,y,z) returned by the mediapipe. So the face_blendshapes cannot be used, am I correct?

kuaashish commented 1 month ago

Hi @sachinksachu,

As mentioned on the overview page here https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker, the task outputs 3-dimensional face landmarks, blendshape scores (coefficients representing facial expressions) to infer detailed facial surfaces in real-time, and transformation matrices for effects rendering. This means the output will be in the form of (x, y, z) coordinates. Additionally, some landmarks are not production-ready, they will be unavailable for use. You can track this issue https://github.com/google-ai-edge/mediapipe/issues/4210 as previously mentioned.

sachinksachu commented 1 month ago

Hi @kuaashish

Out of the 3 types of result, I tried to use 3-dimensional face landmarks which is a normalized value. I need to make use of this 3D coordinates to move my 3D model. Is it possible to convert the normalized 3D coordinates to world space coordinates to position the 3D object.

I would like to do as in this link : https://mediapipe-studio.webapps.google.com/demo/face_landmarker

kuaashish commented 1 week ago

Hi @sachinksachu,

Apologies for the delayed response. Based on feedback from our team, it is currently not possible to achieve precise measurements using image tracking and/or a gyroscope. Without a known reference point, like human irises, or additional data or hardware, accurate scaling is not feasible.

You can position a 3D object in a realistic manner by simulating a world coordinate system, but determining the actual size of the object in meters would require more detailed information.

Thank you!!

github-actions[bot] commented 22 hours ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.