google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.8k stars 5.09k forks source link

MediaPipe Holistic: what are possible reasons for not getting any landmarks? #3767

Closed m-decoster closed 1 year ago

m-decoster commented 1 year ago

TL;DR: What are the reasons that results.left_hand_landmarks (or similar fields) can be None?

I am extracting hand landmarks using code similar to this:

def extract_keypoints(results):
    lh = np.array([[res.x, res.y, res.z] for res in results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks else np.zeros(21*3)
    rh = np.array([[res.x, res.y, res.z] for res in results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks else np.zeros(21*3)
    return np.concatenate([ lh, rh])

For some videos, I get hand landmarks in one frame, but none in the next, even though the frames are (visually) virtually identical. Where can I find information in the documentation or source code that tells me more about why results.left_hand_landmarks can be None? I have tried with several thresholds but there is no setting that always results in predictions as far as I can tell.

In some cases the wrist position is correct, so the hand ROI detector should be able to get a decent crop and thus I would expect keypoints to be detected. But even in such cases, sometimes I get no results from MediaPipe.

kuaashish commented 1 year ago

Hi @m-decoster, It looks like a model not recognizing hands or hand landmarks of the left hand. Could you investigate with several good images or videos and revert back if the issue still persists. Please refer to this closed issue #3339 for better understanding. Thank you!

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

m-decoster commented 1 year ago

@kuaashish Here is an example of a video, where two subsequent frames are very similar, but there are no predictions for one frame and there are predictions for the other frame.

The other failure cases are very similar in the sense that the two frames are practically identical to the human eye but mediapipe fails on one but not on the other.

The blue box was added after MediaPipe extraction for privacy reasons, so it does not influence the mediapipe extraction.

The settings used are static_image_mode=False, model_complexity=2, smooth_landmarks=True

jdambre commented 1 year ago

Still waiting for an answer to this question (https://github.com/google/mediapipe/issues/3339) ...

lucasjinreal commented 1 year ago

image

same issue here, the pose is not working

jdambre commented 1 year ago

@kuaashish @ivan-grishchenko @bazarevsky Could someone please respond to this? Issues like these, and the fact that we get NO information about where they originate, are causing a lot of frustration. Current hand keypoint detection is simply too unstable and unreliable for many applications. This is no problem for things like gesture control in games, where an undetected gesture can simply be repeated (although users may get frustrated). It is a problem in applications where you really need continuous reliability, e.g. when monitoring someone's hand movements during procedures (medical, operational) for learner feedback or for tasks like sign language translation from video.

On the one hand we are very grateful for this tool, but on the other hand many researchers are constrained by the unpredictability and lack of transparency, up to the point where, at the end of the day, we still have no other option than developing our own keypoint extraction models for the hands!

Given the expertise that exists in the research field, it would be more useful to allow us to collaborate/contribute to improvements that benefit all of us than continuing like this with untransparency and continuous parallell efforts for trying to solve or bypass the same basic problems in the tools we need.

lucasjinreal commented 1 year ago

I solved my problem, thank u everyone.

jdambre commented 1 year ago

I solved my problem, thank u everyone. @jinfagang could you share the solution? or was it simply a bug ...

kuaashish commented 1 year ago

Hello @m-decoster, We are upgrading the MediaPipe Legacy Solutions to new MediaPipe solutions However, the libraries, documentation, and source code for all the MediapPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.

You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions which can help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions. Thank you

jdambre commented 1 year ago

Hi @kuaashish ,

This reads like a generic answer that has very little to do with our questions, on which we haven't received ANY replies yet!

You point to new versions, but running MP again on all data takes a very long time, so this is only useful if the detection has improved. Can you tell us whether anything has changed to the (quality of) the core keypoint extraction algorithms?

m-decoster commented 1 year ago

Hello @kuaashish

I have noticed the new solutions, but pose estimation is not yet included, so this is not useful for us right now. Are there any improvements to the underlying model that would improve its robustness in these new solutions?

Are there any indications as to why the robustness is lacking in the situation that I detailed in this comment?

kuaashish commented 1 year ago

@m-decoster, Pose solution is now available(Check here). All upgraded solutions are superset of the old legacy solution, you will able see the better results, Improvement in the performance. Will suggest you to try our upgraded solutions. Further, please feel free to raise a new issue if same behaviour in the upgraded solutions. Thank you!

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 year ago

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No