google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.59k stars 5.16k forks source link

Face Landmarker: facialTransformationMatrixes seems to be off when face is not centered in image [Web] #4759

Open Bersaelor opened 1 year ago

Bersaelor commented 1 year ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Web/Chrome/JS

Programming Language and version

JS/TS

MediaPipe version

0.10.4

Bazel version

No response

Solution

n.A.

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

No response

Xcode & Tulsi version (if issue is related to building for iOS)

No response

Describe the actual behavior

The 2D-landmarks in results.faceLandmarks looks correct, as the 2D canvas that is drawn using drawingUtils.drawConnectors looks in the right spot.

But when putting 3D objects on the face, using facialTransformationMatrixes in a Face Landmarker application, the 3D content seems to be skewed when not centered. I.e. Center: The forward vector z points straight out of the screen

Screenshot 2023-09-05 at 22 15 58

Left of image: The forward (z) vector points left, even though the face is still parallel to the image.

Screenshot 2023-09-05 at 22 15 50

Right of image: The forward (z) vector points right, even though the face is still parallel to the image.

Screenshot 2023-09-05 at 22 16 04

Describe the expected behaviour

Regardless of where the face is in the frame, the facialTransformationMatrixes should be a transformation from the canonical face to the predicted metric face in the 3D space.

Standalone code/steps you may have used to try to get what you need

My scene is created in THREE.js the camera is setup as:

camera = new PerspectiveCamera(
  60, canvas.clientWidth / canvas.clientHeight, 0.01, 100
)

(with the position at 0,0,0 looking into the negative z direction, as described in the docs)

The point of the face position is (and has a little axishelper in it for visualization).

  let matrix = new Matrix4().fromArray(results.facialTransformationMatrixes[0].data);
  faceGroup.matrixAutoUpdate = false
  faceGroup.matrix.copy(faceMatrix)
Bersaelor commented 1 year ago

This problem is also replicateble using the default demo from the mediapipe homepage.

Basic 2d-Mesh of face: (My head is in the first 25% of the webcam image, the screen is just cut since the rest is black)

Screenshot 2023-09-05 at 22 35 58

Now, choosing the avatar while keeping my head perfectly still:

Screenshot 2023-09-05 at 22 36 02

You can see the avatar pointing to the left, even though my face was parallel to the image plane.

Bersaelor commented 1 year ago

To clarify: I'm really impressed with the performance of mediapipe, infering those 478 landmarks in 60fps in the browser with no problem.

Many models and setups I tried before, were much less performant so I was really happy when I discovered mediapipes latest advancements.

Whats curious to me is that the pose estimation based on only a few key points was something other frameworks achieved with rather low effort and good results. For example, I have seen pose estimation using just 6 points (left_eye, right_eye, nose_tip, subnose, l_ear, r_ear) and solvePnP from the OpenCV calib3D module and the result predicted the face pose really well. So thats why I'm suprised that the pose estimation of mediapipe/face_landmarks had the above described issues.

soroushmadani commented 1 year ago

Hi, I'm facing the same issue, and it seems like MediaPipe calculates face landmarks in relation to the camera, while in Three.js, it's aligned with the camera's side rather than the camera itself. Here's a visual to help you understand:

IMG_6728

let me know if you found anything.

Bersaelor commented 1 year ago

Here's a visual to help you understand:

Yes, that is a good illustration. But I would like to point out that the issue also happens with the mediapipe demo project, which is using some vanilla WebGL 3d graphic renderer, if I understand correctly. So this issue isn't tied to Three.js directly.

Bersaelor commented 1 year ago

@schmidt-sebastian @kuaashish @yichunk any update or progress on this issue now that it's been a month?

I fear while this error exists, the mediapipe solution for face-detection could only be uses for toy-examples, as even the demo on the website doesn't work when the users face isn't centered.

schmidt-sebastian commented 1 year ago

Will get back to you when this is fixed, but unfortunately cannot promise a timeline yet.

DESEOUMAIGA commented 10 months ago

@Bersaelor Hello, I encountered the exact same issue when using facial_transformation_matrixes to transform a model to the camera coordinate system. I noticed that the rendering result is misaligned(even 2D landmark results seems correct). Do you have any solutions for this problem?

yanhn commented 4 months ago

Dear biggies, sorry to bother, I am solving a problem that might similar to yours. So would you please help? I currenty have NormalizedLandmark_A(which could be drawn on image once mulplied by image_A's width and image_A's height) and facial_transformation_matrixes_A (a 4x4 ndarray). I need to put this expression onto another reference image_B using its facial_transformation_matrixes_B. Is there any suggestion for me to do this? Thank you.