google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.41k stars 5.15k forks source link

Separation of Pose Detector and Pose Landmarker? #4939

Open Benjabby opened 12 months ago

Benjabby commented 12 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Linux Ubuntu 20.04.6

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

Pose Landmarker

Programming Language and version (e.g. C++, Python, Java)

Python

Describe the actual behavior

Pose detection and pose landmark detection are linked; in the .tasks file there exists two tflite models that are executed - one for detecting people and some landmarks, and one for detecting the full landmarks. However it doesn't seem possible to instantiate either of these seperately through the python API.

Describe the expected behaviour

I want to be able to detect the presence and general location of more than one person (PoseDetectorGraph), but for efficiency purposes only execute full landmark only on a single person.

Standalone code/steps you may have used to try to get what you need

N/A

Other info / Complete Logs

It's semantics but to me this seems like it should be two separate tasks; Task 1 detect people, task 2 detect landmarks for a given detected ROI. They seem to be treated as two seperate graphs at a lower level than the python API, there is PoseDetectorGraph and pose_detector_graph.cc for Task 1 and MultiplePoseLandmarksDetectorGraph in pose_landmarker_graph.cc that takes results from PoseDetectorGraph to build the full landmaker graph (PoseLandmarkerGraph). I assume it is the PoseDetectorGraph that uses the 'pose_detector.tflite' model and MultiplePoseLandmarksDetectorGraph that uses 'pose_landmarks_detector.tflite', both of which are found in the poselandmarker*.task files. But I cannot figure out how to run PoseDetector and then provide only one detected ROI to the Landmark detector. I notice references also to SinglePoseLandmarksDetectorGraph in pose_landmarks_detector_graph.cc but I cannot make heads nor tails of how that fits in. How can I modify the source code to create a task that outputs the results of PoseDetectorGraph and executes the landmark detection only for a single person detected, or alternatively how can I achieve this using just the two .tflite models.

schmidt-sebastian commented 11 months ago

Unfortunately, we currently do not offer the ability to customize our Tasks. This is however a big priority for us coming up and hopefully we will have a solution soon (but please don't wait for it). For now, the only suggestion I have is to modify the code or to use our legacy Pose Detection API and to modify its graph.