google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.58k stars 5.16k forks source link

Face orientation angles - pitch/yaw/roll from face geometry #2809

Closed Shubhambindal2017 closed 2 years ago

Shubhambindal2017 commented 2 years ago

Hi, I'm having some issues with retrieving the actual orientation angles i.e pitch/yaw/roll from face geometry. I need them for my application. I'm probably missing some obvious detail. Can you give me some hints? Thanks

I tried a solution based on the approach suggested by @kostyaby here at : https://github.com/google/mediapipe/issues/1561#issuecomment-771027016

Here is my code snippet that I am using.

// Function based on https://www.geometrictools.com/Documentation/EulerAngles.pdf
const rotationMatrixToEulerAngle = (r) => {
    const [r00, r01, r02, r10, r11, r12, r20, r21, r22] = r;
    let thetaX: number;
    let thetaY: number;
    let thetaZ: number;
    if (r10 < 1) { // YZX calculation
      if (r10 > -1) {
        thetaZ = Math.asin(r10);
        thetaY = Math.atan2(-r20, r00);
        thetaX = Math.atan2(-r12, r11);
      } else {
        thetaZ = -Math.PI / 2;
        thetaY = -Math.atan2(r21, r22);
        thetaX = 0;
      }
    } else {
      thetaZ = Math.PI / 2;
      thetaY = Math.atan2(r21, r22);
      thetaX = 0;
    }
    if (isNaN(thetaX)) thetaX = 0;
    if (isNaN(thetaY)) thetaY = 0;
    if (isNaN(thetaZ)) thetaZ = 0;
    return { pitch: -thetaX, yaw: -thetaY, roll: -thetaZ };
  };

//results - assume results is the result object of FaceMesh

pt_matrix = results.multiFaceGeometry[0].getPoseTransformMatrix()['h'][2]
rotation_matrix = [pt_matrix[0], pt_matrix[1], pt_matrix[2],
                             pt_matrix[4], pt_matrix[5], pt_matrix[6],                                                             
                             pt_matrix[8], pt_matrix[9], pt_matrix[10]]                                                                
angles = rotationMatrixToEulerAngle(rotation_matrix)

But seems like the pitch/yaw/roll I am getting are not very accurate, can anyone please help? Solution - FaceMesh in JS

kostyaby commented 2 years ago

Hey @Shubhambindal2017,

pt_matrix = results.multiFaceGeometry[0].getPoseTransformMatrix()['h'][2]

Could you please comment what is the ['h'][2] part? Could this be relevant to this issue?

rotation_matrix = [pt_matrix[0], pt_matrix[1], pt_matrix[2], ... const [r00, r01, r02, r10, r11, r12, r20, r21, r22] = r;

Please note that getPoseTransformMatrix() returns a MatrixData object which could be (and probably is, by default) a column-major matrix (code). Thus, pt_matrix[1] will probably be corresponding to r10, not r01 if the intended index order is column->row, not row->column (usually employed in math books)

if (r10 < 1) { // YZX calculation

Is there a particular reason the angles are extracted as the YZX Euler angle group? MIT folks claim that the "Roll Pitch Yaw XYZ" <=> "Euler ZYX" - so it's possible this is why you are not getting what you expect


In general, 3D math is hard. My advice would be to definitely employ / borrow ideas from some well-tested math library. Unless you wanted to try figuring out the formula on your own for fun, of course :)

Shubhambindal2017 commented 2 years ago

@kostyaby Hi, thanks for the reply and suggestions.

(a) ['h'][2] was just to get the Pose Transform Matrix as a array - getPackedDataList() can also be used to get it. pt_matrix = results.multiFaceGeometry[0].getPoseTransformMatrix().getPackedDataList()

(b) Yeah, you were correct about column-major matrix, thanks for correcting it, btw I have also re-tried it correctly - but still not much difference in the angle values (except that the angles sign becomes opposite).

(c) Now I have also tried a library (instead of my own code) - Three.js to get Euler angles - but still output angles are not as expected. https://threejs.org/build/three.js

Example - When I kept my face straight and moved it to right - 90* to webcam (with negligible vertical angle), I got

pt_matrix = [0.5567780137062073, 0.034023914486169815, 0.8299639821052551, 0, -0.011918997392058372, 0.9993847608566284, -0.03297339752316475, 0, -0.8305754661560059, 0.008466490544378757, 0.5568410754203796, 0, -1.418548345565796, 6.790719509124756, -39.25355529785156, 1]

pt_matrix_three_js_format.elements are also the same = [0.5567780137062073, 0.034023914486169815, 0.8299639821052551, 0, -0.011918997392058372, 0.9993847608566284, -0.03297339752316475, 0, -0.8305754661560059, 0.008466490544378757, 0.5568410754203796, 0, -1.418548345565796, 6.790719509124756, -39.25355529785156, 1]

Values I got

*Values of Pitch and Yaw makes sense but isn't yaw value unexpected? - as ideally it should be 90 or close to it. @kostyaby What's your thought on this?**

Code that I used for above example

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/face_mesh.js" crossorigin="anonymous"></script>
  <script src="https://threejs.org/build/three.js" crossorigin="anonymous"></script>
</head>

<body>
  <div class="container">
    <video class="input_video"></video>
    <canvas class="output_canvas" width="1280px" height="720px"></canvas>
  </div>
</body>
</html>

<script type="module">
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const canvasCtx = canvasElement.getContext('2d');

function onResults(results) {
  canvasCtx.save();
  canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
  canvasCtx.drawImage(
      results.image, 0, 0, canvasElement.width, canvasElement.height);
  if (results.multiFaceLandmarks) {
    for (const landmarks of results.multiFaceLandmarks) {
      drawConnectors(canvasCtx, landmarks, FACEMESH_TESSELATION,
                     {color: '#C0C0C070', lineWidth: 1});
      drawConnectors(canvasCtx, landmarks, FACEMESH_RIGHT_EYE, {color: '#FF3030'});
      drawConnectors(canvasCtx, landmarks, FACEMESH_RIGHT_EYEBROW, {color: '#FF3030'});
      drawConnectors(canvasCtx, landmarks, FACEMESH_RIGHT_IRIS, {color: '#FF3030'});
      drawConnectors(canvasCtx, landmarks, FACEMESH_LEFT_EYE, {color: '#30FF30'});
      drawConnectors(canvasCtx, landmarks, FACEMESH_LEFT_EYEBROW, {color: '#30FF30'});
      drawConnectors(canvasCtx, landmarks, FACEMESH_LEFT_IRIS, {color: '#30FF30'});
      drawConnectors(canvasCtx, landmarks, FACEMESH_FACE_OVAL, {color: '#E0E0E0'});
      drawConnectors(canvasCtx, landmarks, FACEMESH_LIPS, {color: '#E0E0E0'});
    }
  }

  if (results.multiFaceGeometry){
    for (const facegeometry of results.multiFaceGeometry){
      const pt_matrix = facegeometry.getPoseTransformMatrix().getPackedDataList();
      const pt_matrix_three_js_format = new THREE.Matrix4().fromArray(pt_matrix);
      const euler_angles = new THREE.Euler().setFromRotationMatrix(pt_matrix_three_js_format, 'XYZ');
      const pitch = THREE.MathUtils.radToDeg(euler_angles['x']);
      const yaw = THREE.MathUtils.radToDeg(euler_angles['y']);
      const roll = THREE.MathUtils.radToDeg(euler_angles['z']);

      console.log('-');
    }
  }
  canvasCtx.restore();
}

const faceMesh = new FaceMesh({locateFile: (file) => {
  return `https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/${file}`;
}});
faceMesh.setOptions({
  maxNumFaces: 1,
  enableFaceGeometry: true,
  refineLandmarks: false,
  minDetectionConfidence: 0.48,
  minTrackingConfidence: 0.5
});
faceMesh.onResults(onResults);

const camera = new Camera(videoElement, {
  onFrame: async () => {
    await faceMesh.send({image: videoElement});
  },
  width: 1280,
  height: 720
});
camera.start();
</script>
kostyaby commented 2 years ago

When I kept my face straight and moved it to right - 90* to webcam (with negligible vertical angle), I got ... Values of Pitch and Yaw makes sense but isn't yaw value unexpected? - as ideally it should be 90 or close to it.

I'd be interested in seeing the dynamic. Getting a yaw angle around 60 degrees is a reasonable expectation I'd say - especially when it covers the range from -60 to 60 when the head is rotated from the left to the right

Personally, I rarely work in the Yaw-Pitch-Roll terms and usually in terms of Euler angle rotation groups (XYZ, ZYX and so on) so it's easier for me to find the right mapping as this is the language most 3D math libraries use as well. You'd have to work with a particular library to find what rotation group mapping works out for your expected Yaw-Pitch-Roll terms.

On choosing this the correct Euler angle order: you can work backwards to figure out which one is correct. Set some values for (yaw, pitch, roll) (especially non-trivial ones when more than one component is significantly larger than 0 by absolute value) and try rotating some 3D object with various Euler angle orders. The one that matches your needs would be the one to use during decoding

google-ml-butler[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 2 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

nandin-borjigin commented 2 years ago

Hi @Shubhambindal2017 ,

Have you found a way to extract those angles?

pandeyAayush commented 9 months ago

Same query here @Shubhambindal2017 . Pls let me know if you were able to fix it.