google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.46k stars 5.15k forks source link

About Implementation of 3D hand pose #554

Closed momo1986 closed 4 years ago

momo1986 commented 4 years ago

Hi Mediapipe guys.

After temporarily finishing the face detection project, I focus on the hand detection and tracking project.

I still do some demo based on your aar package customization.

It is similar with this project, but we build our project on USB camera. https://github.com/jiuqiant/mediapipe_multi_hands_tracking_aar_example

My tar-built script is:

load("//mediapipe/java/com/google/mediapipe:mediapipe_aar.bzl", "mediapipe_aar")

mediapipe_aar(
    name = "mp_multi_hand_3d_tracking_aar",
    calculators = ["//mediapipe/graphs/hand_tracking:multi_hand_mobile_calculators"],
)

My launching command is: bazel build -c opt --fat_apk_cpu=arm64-v8a,armeabi-v7a --define 3D=true //mediapipe/examples/android/src/java/com/google/mediapipe/apps/aar_examples:mp_multi_hand_3d_tracking_aar

The hand pose project can be run.

I used the callback as @jiuqiant illustrated to get x, y, z 's coordinate. However, the z value is always 0.

The camera is RGB camera.

My question is how can I visualize the depth and really measure the depth?

Thanks & Regards! Momo

momo1986 commented 4 years ago

Hi @jiuqiant and @fanzhanggoogle .

How can I build the 3D hand pose aar package and use in RGB camera.?

Currently, in my 2D RGB camera project, my retrieved z coordinate value is 0.0.

It looks like it is expected to visualize the depth, but cannot obtain the depth value.

Thanks & Regards!

momo1986 commented 4 years ago

My hardware is UVC-Camera. It is parallel to the taken object.

momo1986 commented 4 years ago

I am not sure, whether the rgb-depth camera, other 3D cameras or multiple 2D cameras are the essential hardware device for the 3D hand pose estimation on Mediapipe.

momo1986 commented 4 years ago

My taken hands under the camera are some hands playing piano, I think it is out-of-training-dataset samples. Will mediapipe think the depth is 0.0 if the scene are not built in point-cloud during training?

momo1986 commented 4 years ago

My Java code is imitated and transferred from: https://github.com/jiuqiant/mediapipe_multi_hands_tracking_aar_example

I notice in the APK build script, https://github.com/google/mediapipe/blob/ae6be10afe59a6a99d8a68007784706ac98720dd/mediapipe/examples/android/src/java/com/google/mediapipe/apps/multihandtrackinggpu/BUILD

The script is : genrule( name = "model", srcs = select({ "//conditions:default": ["//mediapipe/models:hand_landmark.tflite"], ":use_3d_model": ["//mediapipe/models:hand_landmark_3d.tflite"], }), outs = ["hand_landmark.tflite"], cmd = "cp $< $@", )

How can I make multihandtrackinggpu.binarypb to call hand_landmark_3d.tflite instead of hand_landmark.tflite during inference time?

momo1986 commented 4 years ago

Hello, @eknight7 , @jiuqiant @fanzhanggoogle .

I forcedly set the default hand pose model as hand_landmark_3d.tflite.

It can work currently.

Thanks for your patience.

Maybe this can be closed.

Wish you are all good during epidemic.

Regards! Momo