Using side-view to infer face mesh landmark depth coordinate?

Please make sure that this is a solution issue.

System information (Please provide as much relevant information as possible): Macbook Pro 2019

Have I written custom code (as opposed to using a stock example script provided in Mediapipe): Yes, attached

OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4): MacOS Monterey (12.4)

MediaPipe version: 0.8.10.1

Bazel version:

Solution (e.g. FaceMesh, Pose, Holistic): FaceMesh

Programming Language and version ( e.g. C++, Python, Java): Python 3.9

Describe the expected behavior: I am doing photogrammetry (restoring 3D face from photos or video) and FaceMesh promises a great tool for this. I have a video of a head turning both ways. I compute FaceMesh which looks great on until the head turns so much that only one eye becomes visible. At that point the mesh starts to deform around the forehead and nose, departing from the head (see the attached video) Also the visible eye's contour becomes as if the eye is upfront and there is also bad fit of the chin landmarks. I played with min_detection_confidence and min_tracking_confidence parameters (both set to 0.8 works, perhaps, work best), but the problem persists. If I feed just side images to the code, no landmarks are returned at all. This makes me think that once one eye is disappearing from the view, the calculator should change behaviour from landmark detection to updating their z-coordinates which are the most pronounced on side views. I am not at all concerned with real time and computing resources, an offline solution is perfectly fine. I am thinking of projecting the related landmarks on the visible contour, maybe such a filter already exists. I would appreciate any help.

Standalone code you may have used to try to get what you need : I use the standard code suggested here https://google.github.io/mediapipe/solutions/face_mesh#face-geometry-module modified to take a video instead of a video cam stream which I arranged as a function.

If there is a problem, provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/repo link /any notebook:

import cv2 import mediapipe as mp import numpy as np def lm_video_fn(video, normalize=True, save_lm = False, lm_filename = 'lm_save.npy', save_vid = False, vid_filename = 'video_marked.mp4', thickness=-1, circle_radius=1, color=(128, 255, 255), min_detection_confidence=0.8, min_tracking_confidence=0.8 ): mp_drawing = mp.solutions.drawing_utils mp_drawing_styles = mp.solutions.drawing_styles mp_face_mesh = mp.solutions.face_mesh drawing_spec = mp_drawing.DrawingSpec(thickness=thickness, circle_radius=circle_radius, color=color) cap = cv2.VideoCapture(video) # get geometry of the video w = int(cap.get(3)) h = int(cap.get(4)) fps = int(cap.get(5)) print(f'Video {video}: \nWidth: {w} \nHeight: {h} \nFrame rate: {fps}') if save_vid: # Initialize video writer object output = cv2.VideoWriter(vid_filename, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h)) with mp_face_mesh.FaceMesh( static_image_mode= False, max_num_faces=1, refine_landmarks=True, min_detection_confidence=min_detection_confidence, min_tracking_confidence=min_tracking_confidence) as face_mesh: while cap.isOpened(): success, image = cap.read() if not success: print("Ignoring empty camera frame.") break # To improve performance, optionally mark the image as not writeable to # pass by reference. image.flags.writeable = False image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) results = face_mesh.process(image) if save_vid: # Draw the face mesh annotations on the image. annotated_image = image.copy() annotated_image.flags.writeable = True annotated_image = cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR) if results.multi_face_landmarks: for face_landmarks in results.multi_face_landmarks: mp_drawing.draw_landmarks( image=annotated_image, landmark_list=face_landmarks, connections=mp_face_mesh.FACEMESH_TESSELATION, landmark_drawing_spec = drawing_spec, connection_drawing_spec=mp_drawing_styles .get_default_face_mesh_tesselation_style()) output.write(annotated_image) # Flip the image horizontally for a selfie-view display. cv2.imshow('MediaPipe Face Mesh', cv2.flip(annotated_image, 1)) if cv2.waitKey(5) & 0xFF == 27: break cap.release() if save_vid: output.release() cv2.destroyAllWindows() xyz = [[lm.x, lm.y, lm.z] for lm in results.multi_face_landmarks[0].landmark] # landmarks coords if normalize: xyz_norm = np.multiply(xyz, [w, h, w]).astype(int) if save_lm: np.save(lm_filename, xyz_norm) return xyz_norm else: if save_lm: np.save(lm_filename, xyz) return xyz ########################################## MAIN ####################### if __name__ == "__main__": lm = lm_video_fn('G81F_turning.mp4', normalize=True, save_vid=True)

https://user-images.githubusercontent.com/2671379/178466398-dc5cbe73-ede5-43b7-b6a1-6fcd418e1a67.mp4

https://user-images.githubusercontent.com/2671379/178466417-bcf37af2-d32e-45eb-9d04-fe64927c2127.mp4

Other info / Complete Logs : Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached:

google-ai-edge / mediapipe

Using side-view to infer face mesh landmark depth coordinate? #3510