mvazquezgts commented 8 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Ubuntu

MediaPipe Tasks SDK version

Holistic

Task name (e.g. Image classification, Gesture recognition etc.)

Holistic

Programming Language and version (e.g. C++, Python, Java)

Python

Describe the actual behavior

In the actual version: Info about the visibility/confidence of keypoints from the hands is not available.

Describe the expected behaviour

Give information about the confidence of the keypoints of the hands extracted

Standalone code/steps you may have used to try to get what you need

In the current holistic solution, the visibility and presence fields for the hands are always 0.
And contrary to the hand solution which has a field called Handedness where the confidence or score for the hand is indicated, in the case of the holistic solution there is no output indicating the quality or confidence of the keypoints obtained.
Is this a bug? Or am I missing something? 
Thank you very much.

Other info / Complete Logs

HolisticLandmarkerResult(face_landmarks=[
 NormalizedLandmark(x=0.4745168089866638, y=0.36261075735092163, z=-0.0224269051104784, visibility=0.0, presence=0.0), 
 ....
 NormalizedLandmark(x=0.5119104385375977, y=0.2810891270637512, z=0.005499151535332203, visibility=0.0, presence=0.0)], 

 pose_landmarks=[
 NormalizedLandmark(x=0.47517403960227966, y=0.3143022358417511, z=-0.9151485562324524, visibility=0.9999208450317383, presence=0.9995543360710144), 
 ....
 NormalizedLandmark(x=0.41832780838012695, y=1.8102238178253174, z=0.12485508620738983, visibility=0.005907772108912468, presence=0.001108874916099012)], 

 pose_world_landmarks=[

 Landmark(x=-0.046613965183496475, y=-0.5604096055030823, z=-0.3200050890445709, visibility=0.9999208450317383, presence=0.9995543360710144), 
 ...
 Landmark(x=-0.12133946269750595, y=0.5424543023109436, z=0.04660561680793762, visibility=0.005907772108912468, presence=0.001108874916099012)], 

 left_hand_landmarks=[
 NormalizedLandmark(x=0.5576450228691101, y=0.7599831819534302, z=4.721105995031394e-07, visibility=0.0, presence=0.0), 
 ....
 NormalizedLandmark(x=0.6063085794448853, y=0.5707101821899414, z=-0.08902209997177124, visibility=0.0, presence=0.0)], 

 left_hand_world_landmarks=[
 Landmark(x=0.019411759451031685, y=-0.2692203223705292, z=-0.36530426144599915, visibility=0.0, presence=0.0), 
 ..... 
 Landmark(x=0.017838725820183754, y=-0.3276180922985077, z=-0.42397379875183105, visibility=0.0, presence=0.0)], 

 right_hand_landmarks=[
 NormalizedLandmark(x=0.3994499146938324, y=0.7287973761558533, z=3.09612943283355e-07, visibility=0.0, presence=0.0), 
 .....
 NormalizedLandmark(x=0.3777098059654236, y=0.6123549938201904, z=-0.02483273483812809, visibility=0.0, presence=0.0)], 

 right_hand_world_landmarks=[
 Landmark(x=-0.1434965282678604, y=-0.22600455582141876, z=-0.3554910123348236, visibility=0.0, presence=0.0), 
 .....
 Landmark(x=-0.14976395666599274, y=-0.30362075567245483, z=-0.39388307929039, visibility=0.0, presence=0.0)], 

 face_blendshapes=None, segmentation_mask=None)

kuaashish commented 8 months ago

Hi @mvazquezgts,

Could you please provide additional information about the problem. Include the following details:

Outline the steps you are following to implement based on the documentation.
Specify the Ubuntu version you are using.
Provide the version of MediaPipe is being used Along with Python Version.

Providing this information will help us better understand and address the issue.

Thank you!!

mvazquezgts commented 8 months ago

OS: Ubuntu Programming Language: Python Version de Mediapipe: 0.10.11 Solution: Holistic

Given an input image/frame the output of the model is:

HolisticLandmarkerResult(face_landmarks=[ NormalizedLandmark(x=0.4745168089866638, y=0.36261075735092163, z=-0.0224269051104784, visibility=0.0, presence=0.0), .... NormalizedLandmark(x=0.5119104385375977, y=0.2810891270637512, z=0.005499151535332203, visibility=0.0, presence=0.0)],

pose_landmarks=[ NormalizedLandmark(x=0.47517403960227966, y=0.3143022358417511, z=-0.9151485562324524, visibility=0.9999208450317383, presence=0.9995543360710144), .... NormalizedLandmark(x=0.41832780838012695, y=1.8102238178253174, z=0.12485508620738983, visibility=0.005907772108912468, presence=0.001108874916099012)],

pose_world_landmarks=[

Landmark(x=-0.046613965183496475, y=-0.5604096055030823, z=-0.3200050890445709, visibility=0.9999208450317383, presence=0.9995543360710144), ... Landmark(x=-0.12133946269750595, y=0.5424543023109436, z=0.04660561680793762, visibility=0.005907772108912468, presence=0.001108874916099012)],

left_hand_landmarks=[ NormalizedLandmark(x=0.5576450228691101, y=0.7599831819534302, z=4.721105995031394e-07, visibility=0.0, presence=0.0), .... NormalizedLandmark(x=0.6063085794448853, y=0.5707101821899414, z=-0.08902209997177124, visibility=0.0, presence=0.0)],

left_hand_world_landmarks=[ Landmark(x=0.019411759451031685, y=-0.2692203223705292, z=-0.36530426144599915, visibility=0.0, presence=0.0), ..... Landmark(x=0.017838725820183754, y=-0.3276180922985077, z=-0.42397379875183105, visibility=0.0, presence=0.0)],

right_hand_landmarks=[ NormalizedLandmark(x=0.3994499146938324, y=0.7287973761558533, z=3.09612943283355e-07, visibility=0.0, presence=0.0), ..... NormalizedLandmark(x=0.3777098059654236, y=0.6123549938201904, z=-0.02483273483812809, visibility=0.0, presence=0.0)],

right_hand_world_landmarks=[ Landmark(x=-0.1434965282678604, y=-0.22600455582141876, z=-0.3554910123348236, visibility=0.0, presence=0.0), ..... Landmark(x=-0.14976395666599274, y=-0.30362075567245483, z=-0.39388307929039, visibility=0.0, presence=0.0)],

face_blendshapes=None, segmentation_mask=None)

The available data/fields are:

face_landmarks [ x, y , z, visibility, presence]
pose_landmarks & pose_world_landmarks [ x, y , z, visibility, presence]
left_hand_landmarks & left_hand_world_landmarks [ x, y , z, visibility, presence]
right_hand_landmarks & right_hand_world_landmarks [ x, y , z, visibility, presence]
face_blendshapes
segmentation_mask

Both hand and face visibility and presence always gives 0, regardless of the configuration with which you set up/initialise the model. It only gives some information in pose_landmarks & pose_world_landmarks.

So, my question is whether this is a bug or if it is possible to get the confidence/visibility of the hand points in another way. In the hands model (documentation: https://developers.google.com/mediapipe/solutions/vision/hand_landmarker/python) I see that there is a 'Handedness' field that contains this information.

HandLandmarkerResult: Handedness: Categories #0: index : 0 score : 0.98396 categoryName : Left Landmarks: Landmark #0: x : 0.638852 y : 0.671197 z : -3.41E-7 Landmark #1: x : 0.634599 y : 0.536441 z : -0.06984 ... (21 landmarks for a hand) WorldLandmarks: Landmark #0: x : 0.067485 y : 0.031084 z : 0.055223 Landmark #1: x : 0.063209 y : -0.00382 z : 0.020920

But it seems that in this version of Holistic there is no way to get a score for hand points.

schmidt-sebastian commented 8 months ago

Thank you for raising this. As this is our newest Task, we likely need to invest a bit more time here.

endink commented 6 months ago

Its an old issue:

3505

amokto commented 2 months ago

I am experiencing the same issue (holistic task on web), any updates on this?

yiyiyaya1122 commented 3 weeks ago

I am experiencing the same issue (holistic task on web), any updates on this?

gokturkDev commented 2 weeks ago

wow this is really unprofessional from you guys... there are multiple issues about this. Can someone make a proper explanation about the state of visibility/confidence of pose, face and hand landmark detection @schmidt-sebastian @kuaashish

gokturkDev commented 2 weeks ago

can you at least please confirm that every landmark will be predicted even if it is not present in the image. If so, can it ever be out of bounds of the base image? @schmidt-sebastian @kuaashish https://github.com/google-ai-edge/mediapipe/issues/3159

google-ai-edge / mediapipe

[HOLISTIC SOLUTION] Info about the visibility/confidence of keypoints from the hands is not available. #5212

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

3505