Open Shrimperator opened 9 months ago
IMO even an explanation would do wonders here, as per the Face Landmarker Configuration options, an option called output_facial_transformation_matrixes
exists, that will output aforementioned matrix.
The option toggles wheter a matrix is output, which is described with:
FaceLandmarker uses the matrix to transform the face landmarks from a canonical face model to the detected face, so users can apply effects on the detected landmarks.
That would make me think it's inversible with this matrix, that would solve the above problem as far as I can tell.
However, it's not documented anywhere, and I'm stumped on how to extract any useful information from it.
Maybe a short documentation/explanation added to the matrix points could be provided? The other outputs (blendshapes, face landmarks) have pretty good documentation readily available from various sources, but what this mysterious 4x4 matrix does exactly is beyond me right now.
Hi @clemenshimmer,
Yes, we acknowledge the need to enhance our documentation. Adding a brief explanation or documentation for the 4x4 matrix alongside the existing outputs such as blendshapes and face landmarks would be beneficial.
Thank you!!
Hi @markmcd,
Could you please look into this issue?
Thank you!!
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
None
OS Platform and Distribution
Windows
MediaPipe Tasks SDK version
No response
Task name (e.g. Image classification, Gesture recognition etc.)
FaceLandmarker
Programming Language and version (e.g. C++, Python, Java)
Javascript (/Unreal Engine 5.3 C++)
Describe the actual behavior
I fail to apply the transformation matrix to the landmarks
Describe the expected behaviour
I succeed in applying the transformation matrix to the landmarks
Standalone code/steps you may have used to try to get what you need
Thank you for the publicly accessible model first of all; the landmark detection works great! I just need a little bit of help processing the data further :)
The problem:
Here's what I'm trying to achieve:
I'd like to compare sets of landmarks during different expressions, in order to mathematically infer several blendshapes that currently don't work with FaceLandmarker. In order to smoothly achieve this, I need to compare landmarks in a format that accounts for scale (due to distance to the camera), head rotation and head location relative to the camera. Basically, I want the landmarks aligned as if the user were directly in front of the center of the camera at all times, head completely straight.
The current format FaceLandmarker (using the JS API) outputs landmarks includes scale, rotation and location relative to the camera, which doesn't work for what I'm trying to do.
I'll post some examples written in Unreal Engine C++, though I'll absolutely take solutions in JavaScript as well.
With every fourth value of the matrix data being constant (0, 0, 0, 1), I'm assuming this is a 4x4 matrix in row major order.
Though naively applying this matrix to the landmarks using Unreal's FMatrix gives erratic results, obviously. I assume this is probably due to differences in the coordinate systems between MediaPipe and Unreal and makes total sense.
Converting vectors from one system to the other is pretty straight forward like this:
Though since I don't suppose there is any way to apply this knowledge to a transformation matrix, I've been trying to perform the matrix multiplication myself, with little success.
I've been trying so many different combinations of multiplying the matrix with the landmark vectors, I'm not even sure what to post as examples anymore.
I would be eternally grateful if somebody could shed some light on how the transformation matrix is actually meant to be used. An example of how to apply its inverse to get back to the canonical face data would be great, ideally in JS or C++. I'm kind of stuck on this right now, and I'm not sure whether its my grasp on the maths behind matrices that's lacking, or whether I just don't understand what this matrix in particular is supposed to do, or how it's constructed.
Thank you very much, and have a great day! :)
Other info / Complete Logs
No response