google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.52k stars 5.15k forks source link

Using side-view to infer face mesh landmark depth coordinate? #3510

Closed baba-yaga closed 2 years ago

baba-yaga commented 2 years ago

Please make sure that this is a solution issue.

System information (Please provide as much relevant information as possible): Macbook Pro 2019

  • Have I written custom code (as opposed to using a stock example script provided in Mediapipe): Yes, attached
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4): MacOS Monterey (12.4)
  • MediaPipe version: 0.8.10.1
  • Bazel version:
  • Solution (e.g. FaceMesh, Pose, Holistic): FaceMesh
  • Programming Language and version ( e.g. C++, Python, Java): Python 3.9

Describe the expected behavior: I am doing photogrammetry (restoring 3D face from photos or video) and FaceMesh promises a great tool for this. I have a video of a head turning both ways. I compute FaceMesh which looks great on until the head turns so much that only one eye becomes visible. At that point the mesh starts to deform around the forehead and nose, departing from the head (see the attached video) Also the visible eye's contour becomes as if the eye is upfront and there is also bad fit of the chin landmarks. I played with min_detection_confidence and min_tracking_confidence parameters (both set to 0.8 works, perhaps, work best), but the problem persists. If I feed just side images to the code, no landmarks are returned at all. This makes me think that once one eye is disappearing from the view, the calculator should change behaviour from landmark detection to updating their z-coordinates which are the most pronounced on side views. I am not at all concerned with real time and computing resources, an offline solution is perfectly fine. I am thinking of projecting the related landmarks on the visible contour, maybe such a filter already exists. I would appreciate any help.

Standalone code you may have used to try to get what you need : I use the standard code suggested here https://google.github.io/mediapipe/solutions/face_mesh#face-geometry-module modified to take a video instead of a video cam stream which I arranged as a function.

If there is a problem, provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/repo link /any notebook:

import cv2
import mediapipe as mp
import numpy as np

def lm_video_fn(video, normalize=True,
                save_lm = False, lm_filename = 'lm_save.npy',
                save_vid = False, vid_filename = 'video_marked.mp4',
                thickness=-1, circle_radius=1, color=(128, 255, 255),
                min_detection_confidence=0.8,
                min_tracking_confidence=0.8
                ):

    mp_drawing = mp.solutions.drawing_utils
    mp_drawing_styles = mp.solutions.drawing_styles
    mp_face_mesh = mp.solutions.face_mesh
    drawing_spec = mp_drawing.DrawingSpec(thickness=thickness,
                                          circle_radius=circle_radius,
                                          color=color)

    cap = cv2.VideoCapture(video)
    # get geometry of the video
    w = int(cap.get(3))
    h = int(cap.get(4))
    fps = int(cap.get(5))
    print(f'Video {video}: \nWidth: {w} \nHeight: {h} \nFrame rate: {fps}')

    if save_vid:
        # Initialize video writer object
        output = cv2.VideoWriter(vid_filename,
                                 cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
    with mp_face_mesh.FaceMesh(
        static_image_mode= False,
        max_num_faces=1,
        refine_landmarks=True,
        min_detection_confidence=min_detection_confidence,
        min_tracking_confidence=min_tracking_confidence) as face_mesh:
        while cap.isOpened():
            success, image = cap.read()
            if not success:
              print("Ignoring empty camera frame.")
              break

            # To improve performance, optionally mark the image as not writeable to
            # pass by reference.
            image.flags.writeable = False
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            results = face_mesh.process(image)

            if save_vid:
                # Draw the face mesh annotations on the image.
                annotated_image = image.copy()
                annotated_image.flags.writeable = True
                annotated_image = cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)
                if results.multi_face_landmarks:
                    for face_landmarks in results.multi_face_landmarks:
                        mp_drawing.draw_landmarks(
                            image=annotated_image,
                            landmark_list=face_landmarks,
                            connections=mp_face_mesh.FACEMESH_TESSELATION,
                                                   landmark_drawing_spec = drawing_spec,
                            connection_drawing_spec=mp_drawing_styles
                                                                           .get_default_face_mesh_tesselation_style())
                        output.write(annotated_image)

                # Flip the image horizontally for a selfie-view display.
                cv2.imshow('MediaPipe Face Mesh', cv2.flip(annotated_image, 1))
                if cv2.waitKey(5) & 0xFF == 27:
                    break
    cap.release()
    if save_vid:
        output.release()
        cv2.destroyAllWindows()

    xyz = [[lm.x, lm.y, lm.z] for lm in results.multi_face_landmarks[0].landmark] # landmarks coords
    if normalize:
        xyz_norm = np.multiply(xyz, [w, h, w]).astype(int)
        if save_lm:
            np.save(lm_filename, xyz_norm)
        return xyz_norm
    else:
        if save_lm:
            np.save(lm_filename, xyz)
        return xyz

########################################## MAIN #######################
if __name__ == "__main__":
    lm = lm_video_fn('G81F_turning.mp4', normalize=True, save_vid=True)

https://user-images.githubusercontent.com/2671379/178466398-dc5cbe73-ede5-43b7-b6a1-6fcd418e1a67.mp4

https://user-images.githubusercontent.com/2671379/178466417-bcf37af2-d32e-45eb-9d04-fe64927c2127.mp4

Other info / Complete Logs : Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached:

Screenshot 2022-07-12 at 11 20 50
sureshdagooglecom commented 2 years ago

Hi @baba-yaga ,

1) The limitation on head angle is a limitation of the modeling technique (training on face forward faces).
2)Please generalize the mediapiep solution.
sounds like a good refinement or generalization of mediapipe solution. Google won't do this refinement for now.

google-ml-butler[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 2 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No