Open Dillxn opened 1 year ago
Hi @Dillxn,
Can you share how did you get the hand_world_landmarks in the holistic solution?I would like to try it too but I don't know where to start, I'm using the MediapipeUnityPlugin to get the solution in C#.
You can use solvePnP
in opencv to get the coordinates in world space, see my comment here for it working in python https://github.com/google/mediapipe/issues/2199#issuecomment-1443493391
It basically takes the "landmarks", which are in the image coordiantes and "world landmarks" which are 3D but not in world space, and finds the camera position. You can then flip the camera position it finds to place the world landmarks into a 3D space that matches up with the camera.
I don't know how to get solvepnp into unity, but maybe this c# port of opencv would help https://www.emgu.com/wiki/index.php/Main_Page
@Dillxn, As from the description, Currently we do not have any solution related to C# available in mediapipe. Could you please provide the complete details about the issue and platform you are using such as Android, C++, Python or other solutions available. Thank you!
Hi @kuaashish! I am using a plugin to wrap MediaPipe in a C# layer so it can be used in Unity. The plugin is found here. I'm really just curious if there is any calculator (or combination thereof) I can use in my config file to get all the world landmarks in the same coordinate system. (In other words, they all share the same 0,0,0; they all share the same world space...)
Here is my current graph config:
# The following file is modified by ASL XR Team, WKU
# Copyright 2019 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copied from mediapipe/graphs/holistic_tracking/holistic_tracking_gpu.pbtxt
#
# CHANGES:
# - Add ImageTransformationCalculator and rotate the input
# - Remove AnnotationOverlayCalculator
# Tracks and renders pose + hands + face landmarks.
# CPU image. (ImageFrame)
input_stream: "input_video"
output_stream: "pose_landmarks"
output_stream: "pose_world_landmarks"
output_stream: "segmentation_mask"
output_stream: "pose_roi"
output_stream: "pose_detection"
output_stream: "face_landmarks"
output_stream: "left_hand_landmarks"
output_stream: "right_hand_landmarks"
output_stream: "left_hand_world_landmarks"
output_stream: "right_hand_world_landmarks"
# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for downstream nodes
# (calculators and subgraphs) in the graph to finish their tasks before it
# passes through another image. All images that come in while waiting are
# dropped, limiting the number of in-flight images in most part of the graph to
# 1. This prevents the downstream nodes from queuing up incoming images and data
# excessively, which leads to increased latency and memory usage, unwanted in
# real-time mobile applications. It also eliminates unnecessarily computation,
# e.g., the output produced by a node may get dropped downstream if the
# subsequent nodes are still busy processing previous inputs.
node {
calculator: "FlowLimiterCalculator"
input_stream: "input_video"
input_stream: "FINISHED:face_landmarks"
input_stream_info: {
tag_index: "FINISHED"
back_edge: true
}
output_stream: "throttled_input_video"
node_options: {
[type.googleapis.com/mediapipe.FlowLimiterCalculatorOptions] {
max_in_flight: 1
max_in_queue: 1
# Timeout is disabled (set to 0) as first frame processing can take more
# than 1 second.
in_flight_timeout: 0
}
}
}
node: {
calculator: "ImageTransformationCalculator"
input_stream: "IMAGE:throttled_input_video"
input_side_packet: "ROTATION_DEGREES:input_rotation"
input_side_packet: "FLIP_HORIZONTALLY:input_horizontally_flipped"
input_side_packet: "FLIP_VERTICALLY:input_vertically_flipped"
output_stream: "IMAGE:transformed_input_video"
}
node {
calculator: "HolisticLandmarkCpu"
input_stream: "IMAGE:transformed_input_video"
input_side_packet: "MODEL_COMPLEXITY:model_complexity"
input_side_packet: "SMOOTH_LANDMARKS:smooth_landmarks"
input_side_packet: "REFINE_FACE_LANDMARKS:refine_face_landmarks"
input_side_packet: "ENABLE_SEGMENTATION:enable_segmentation"
input_side_packet: "SMOOTH_SEGMENTATION:smooth_segmentation"
output_stream: "POSE_LANDMARKS:pose_landmarks"
output_stream: "WORLD_LANDMARKS:pose_world_landmarks"
output_stream: "SEGMENTATION_MASK:segmentation_mask_rotated"
output_stream: "POSE_ROI:pose_roi"
output_stream: "POSE_DETECTION:pose_detection"
output_stream: "FACE_LANDMARKS:face_landmarks"
}
# Predicts left and right hand landmarks based on the initial pose landmarks.
node {
calculator: "HandLandmarksLeftAndRightCpu"
input_stream: "IMAGE:transformed_input_video"
input_stream: "POSE_LANDMARKS:pose_landmarks"
output_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
output_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
output_stream: "LEFT_HAND_ROI_FROM_POSE:left_hand_roi_from_pose"
output_stream: "RIGHT_HAND_ROI_FROM_POSE:right_hand_roi_from_pose"
}
# left hand world landmarks
node {
calculator: "HandLandmarkCpu"
input_stream: "IMAGE:transformed_input_video"
input_stream: "ROI:left_hand_roi_from_pose"
input_side_packet: "MODEL_COMPLEXITY:model_complexity"
output_stream: "WORLD_LANDMARKS:left_hand_world_landmarks"
}
# right hand world landmarks
node {
calculator: "HandLandmarkCpu"
input_stream: "IMAGE:transformed_input_video"
input_stream: "ROI:right_hand_roi_from_pose"
input_side_packet: "MODEL_COMPLEXITY:model_complexity"
output_stream: "WORLD_LANDMARKS:right_hand_world_landmarks"
}
node: {
calculator: "ImageTransformationCalculator"
input_stream: "IMAGE:segmentation_mask_rotated"
input_side_packet: "ROTATION_DEGREES:output_rotation"
input_side_packet: "FLIP_HORIZONTALLY:output_horizontally_flipped"
input_side_packet: "FLIP_VERTICALLY:output_vertically_flipped"
output_stream: "IMAGE:segmentation_mask"
}
Hi @ivan-grishchenko, Could you please look into this issue? Thank you!
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
Yes
OS Platform and Distribution
Windows 10, Unity Engine
MediaPipe version
0.9.1
Bazel version
No response
Solution
Holistic
Programming Language and version
C#
Describe the actual behavior
Describe the expected behaviour
Standalone code/steps you may have used to try to get what you need
No response
Other info / Complete Logs
No response