google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.56k stars 5.16k forks source link

Using FaceTransformationMatrix to revert rotation/location/scale of landmarks (Unreal Engine) #5144

Open Shrimperator opened 9 months ago

Shrimperator commented 9 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

None

OS Platform and Distribution

Windows

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

FaceLandmarker

Programming Language and version (e.g. C++, Python, Java)

Javascript (/Unreal Engine 5.3 C++)

Describe the actual behavior

I fail to apply the transformation matrix to the landmarks

Describe the expected behaviour

I succeed in applying the transformation matrix to the landmarks

Standalone code/steps you may have used to try to get what you need

Thank you for the publicly accessible model first of all; the landmark detection works great! I just need a little bit of help processing the data further :)

The problem:

Here's what I'm trying to achieve:

I'd like to compare sets of landmarks during different expressions, in order to mathematically infer several blendshapes that currently don't work with FaceLandmarker. In order to smoothly achieve this, I need to compare landmarks in a format that accounts for scale (due to distance to the camera), head rotation and head location relative to the camera. Basically, I want the landmarks aligned as if the user were directly in front of the center of the camera at all times, head completely straight.

The current format FaceLandmarker (using the JS API) outputs landmarks includes scale, rotation and location relative to the camera, which doesn't work for what I'm trying to do.


  1. If I understood the purpose of the FaceTransformationMatrix correctly, it represents the transformation of the landmarks from where they were basically overlaid over the canonical face mesh, to the position they are in when FaceLandmarker outputs them. So the inverse of this matrix should solve my problem. Am I correct in this assumption first of all?

  1. Unfortunately, I've been having very little luck with this approach. Mostly, I'm totally unsure about what kind of values I'm dealing with in the FaceTransformationMatrix.

I'll post some examples written in Unreal Engine C++, though I'll absolutely take solutions in JavaScript as well.

With every fourth value of the matrix data being constant (0, 0, 0, 1), I'm assuming this is a 4x4 matrix in row major order.

FMatrix faceMat;

faceMat.M[0][0] = matrixData[0];
faceMat.M[0][1] = matrixData[1];
faceMat.M[0][2] = matrixData[2];
faceMat.M[0][3] = 0.0;

faceMat.M[1][0] = matrixData[4];
faceMat.M[1][1] = matrixData[5];
faceMat.M[1][2] = matrixData[6];
faceMat.M[1][3] = 0.0;

faceMat.M[2][0] = matrixData[8];
faceMat.M[2][1] = matrixData[9];
faceMat.M[2][2] = matrixData[10];
faceMat.M[2][3] = 0.0;

faceMat.M[3][0] = matrixData[12];
faceMat.M[3][1] = matrixData[13];
faceMat.M[3][2] = matrixData[14];
faceMat.M[3][3] = 1.0;

Though naively applying this matrix to the landmarks using Unreal's FMatrix gives erratic results, obviously. I assume this is probably due to differences in the coordinate systems between MediaPipe and Unreal and makes total sense.

Converting vectors from one system to the other is pretty straight forward like this:

// Unreal's FVector init order: x, y, z
FVector(mediaPipeVector.X, mediaPipeVector.Z * - 1.0f, mediaPipeVector.Y * -1.0f);

Though since I don't suppose there is any way to apply this knowledge to a transformation matrix, I've been trying to perform the matrix multiplication myself, with little success.

I've been trying so many different combinations of multiplying the matrix with the landmark vectors, I'm not even sure what to post as examples anymore.

I would be eternally grateful if somebody could shed some light on how the transformation matrix is actually meant to be used. An example of how to apply its inverse to get back to the canonical face data would be great, ideally in JS or C++. I'm kind of stuck on this right now, and I'm not sure whether its my grasp on the maths behind matrices that's lacking, or whether I just don't understand what this matrix in particular is supposed to do, or how it's constructed.

Thank you very much, and have a great day! :)

Other info / Complete Logs

No response

clemenshimmer commented 8 months ago

IMO even an explanation would do wonders here, as per the Face Landmarker Configuration options, an option called output_facial_transformation_matrixes exists, that will output aforementioned matrix.

The option toggles wheter a matrix is output, which is described with:

FaceLandmarker uses the matrix to transform the face landmarks from a canonical face model to the detected face, so users can apply effects on the detected landmarks.

That would make me think it's inversible with this matrix, that would solve the above problem as far as I can tell.

However, it's not documented anywhere, and I'm stumped on how to extract any useful information from it.

Maybe a short documentation/explanation added to the matrix points could be provided? The other outputs (blendshapes, face landmarks) have pretty good documentation readily available from various sources, but what this mysterious 4x4 matrix does exactly is beyond me right now.

kuaashish commented 7 months ago

Hi @clemenshimmer,

Yes, we acknowledge the need to enhance our documentation. Adding a brief explanation or documentation for the 4x4 matrix alongside the existing outputs such as blendshapes and face landmarks would be beneficial.

Thank you!!

kuaashish commented 7 months ago

Hi @markmcd,

Could you please look into this issue?

Thank you!!