google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.72k stars 5.18k forks source link

(face_mesh)how can i get value of the Z coordinate in right way? #2043

Closed aoxipo closed 3 years ago

aoxipo commented 3 years ago

i am try to use face landmark in face_mesh solution , luckly i get right x and y in the landmark by code

auto &output_mult_face = detection_packet.Get<std::vector<::mediapipe::NormalizedLandmarkList>>();
LOG(INFO) << "start translate point";
for (int i = 0; i < output_mult_face.size(); i++) {
    std::vector<std::vector<float>> landmark_vector;
    for (int j = 0; j < output_mult_face[i].landmark_size(); ++j) {
        const mediapipe::NormalizedLandmark &landmark = output_mult_face[i].landmark(j);
        std::vector<float> axis;
        axis.push_back(landmark.x() * width);
        axis.push_back(landmark.y() * height);
        axis.push_back(landmark.z() * width);
        landmark_vector.push_back(axis);
    }
    out_landmark.push_back(landmark_vector);
}

but when i mesh the landmark in the face there some error in z coordinate here is picture,

I found that the value of forehead is - 55, and the value of nose is - 145. In contrast, the position of forehead and nose is significantly different. And when I get close to the camera, the value of nose becomes larger, rather than gradually decreasing to 0. Why are some landmark positive and some negative?

Did I use the wrong way to get or transform Z coordinate??

企业微信截图_16213050406931

sgowroji commented 3 years ago

Hi @aoxipo, Have a look at this comment https://github.com/google/mediapipe/issues/1895#issuecomment-822121488 face geometry is clearly explained with all the coordinates.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No

wwdok commented 2 years ago

@aoxipo Hi, how do you understand the z coordiante value now ?

sysucv-juan commented 1 year ago

@aoxipo Hi, how do you understand the z coordiante value now ? according to the documents from mediapipe: "The Face Landmark Model performs a single-camera face landmark detection in the screen coordinate space: the X- and Y- coordinates are normalized screen coordinates, while the Z coordinate is relative and is scaled as the X coordinate under the weak perspective projection camera model." So in my opinion the z coodiante is a pixel-scale value, we must multiply the width to get the absolute pixel value. And if u want to get the real-world value u can use camera intrinsic parameters to get the meter-scale z value.