Open AmitMY opened 6 months ago
Alright, here is a solution
Optimizing_Hand_Area_Detection_in_MediaPipe_Holistic.pdf
https://github.com/sign-language-processing/mediapipe-hand-crop-fix
I would have preferred to write a mathematical solution, so it would be easy to contribute in a PR, but ended up with a model that is very extremely lightweight.
Would be happy to know what you think, and if you could optimize this. (and train on a larger dataset)
Solution
Holistic
Describe the actual behavior
When using the
holistic
solution, the calculator first estimates the body pose and three points for each hand (wrist
,index mcp
andpinky mcp
) - then, it estimates a rectangle of the hand area of interest, which should cover the full hand, to be sent for hand keypoints. https://github.com/google/mediapipe/blob/master/mediapipe/modules/holistic_landmark/calculators/hand_detections_from_pose_to_rects_calculator.cc#L110-L121There are some edge cases that in my view, do not create correct hand rects, and so fail to estimate the hands. This happens when the hand estimation is off, or when the plane of the hands (the triangle created by the three points of interest) lies directly perpendicular to the camera. When this happens, the "area" of interest is tiny and so the crop will be wrong.
Hands model
If we look at the hands model that uses hand detection, it works well enough:
https://github.com/google/mediapipe/assets/5757359/7183a017-1256-48fa-a20d-083b0807cf47
Pose model
The pose model also works well enough, correctly predicting the general hand position
https://github.com/google/mediapipe/assets/5757359/a06b31a6-8a11-43d5-9bc3-e77ca73e80fc
Holistic model (pose + area + hands)
Holistic model works really well when parallel to the camera, but not when parallel to the floor.
https://github.com/google/mediapipe/assets/5757359/8ac50e97-d50b-4e23-b6d5-41b52fbdf3cf
If I recreate the holistic ROI cropping behavior (without rotation, and with rotation), I get the following. Note how the crop goes crazy.
Describe the expected behavior
I expect that the area of interest will always be correct if the hand estimation is correct, and if not, the hand model will be activated using the hand detection model as a backup.
Possible solution
Before the recroping model