facebookresearch / hot3d

HOT3D: An egocentric dataset for 3D hand and object tracking
https://facebookresearch.github.io/hot3d/
Apache License 2.0
83 stars 12 forks source link

Is it possible to project a 3D hand mesh onto a 2D image? #14

Closed ryhara closed 1 month ago

ryhara commented 1 month ago

@SeaOtocinclus

Hi, thank you for releasing such a great dataset and library.

Is it possible to project a 3D hand mesh onto a 2D image? I tried searching the code for keywords like "projection," but I couldn't find anything related to it. Is there a function that can project? Could you please provide guidance on how to achieve this projection if possible?

SeaOtocinclus commented 1 month ago

Hi @ryhara , thank you for your issue and comments.

If you are using the HOT3D sequences:

You can project the mesh vertices and use the project function on the camera. We dont have an example yet but I can try to add one next week.

Here what you should do in pseudo code:

I will extend the demo notebook with a camera projection part https://github.com/facebookresearch/hot3d/blob/main/hot3d/HOT3D_Tutorial.ipynb

If you are using the HOT3D clips:

ryhara commented 1 month ago

thank you for reply!

I use HOT3D sequences, so I'm looking forward to seeing the examples added!

The method for projecting 3d joints into 2d is the same as for meshes, right?

I'm really grateful for the support.

Shashvatb commented 1 month ago

Hi @SeaOtocinclus, I wrote a small snippet for 2D conversion of the points but i somehow cant get the right annotations. Adding the snippet below EDIT: just doing it for the right hand right now. just for the landmarks. not the entire mesh (since its easier to visualize) EDIT 2: added bounding box to help visualize better

for timestamp_ns in tqdm(timestamps[50:]):
    box2d_collection_with_dt = (
        hand_box2d_data_provider.get_bbox_at_timestamp(
            stream_id=stream_id,
            timestamp_ns=timestamp_ns,
            time_query_options=TimeQueryOptions.CLOSEST,
            time_domain=TimeDomain.TIME_CODE,
        )
    )

    if box2d_collection_with_dt is None:
        continue
    if (
        box2d_collection_with_dt is None
        and box2d_collection_with_dt.box2d_collection or None
    ):
        continue

    try:
        axis_aligned_box2d = box2d_collection_with_dt.box2d_collection.box2ds[RIGHT_HAND_INDEX]
    except KeyError:
        continue
    bbox = axis_aligned_box2d.box2d
    if bbox is None:
        continue

    # handpose
    hand_poses_with_dt = hand_data_provider.get_pose_at_timestamp(
        timestamp_ns=timestamp_ns,
        time_query_options=TimeQueryOptions.CLOSEST,
        time_domain=TimeDomain.TIME_CODE,
    )

    if hand_poses_with_dt is None:
        continue
    hand_pose_collection = hand_poses_with_dt.pose3d_collection
    for hand_pose_data in hand_pose_collection.poses.values():
        if hand_pose_data.is_left_hand():
            continue
        elif hand_pose_data.is_right_hand():
            hand_landmarks = hand_data_provider.get_hand_landmarks(
                hand_pose_data
            )

    # camera pose
    headset_pose3d_with_dt = device_pose_provider.get_pose_at_timestamp(
                    timestamp_ns=timestamp_ns,
                    time_query_options=TimeQueryOptions.CLOSEST,
                    time_domain=TimeDomain.TIME_CODE,
                )

    if headset_pose3d_with_dt is None:
        continue

    headset_pose3d = headset_pose3d_with_dt.pose3d
    Rt = headset_pose3d.T_world_device @ extrinsics.__copy__()
    Rt = Rt.to_matrix3x4()

    R_cam, t_cam = Rt[:, :3], Rt[:, 3:]
    joints_uv, _ = cv2.projectPoints(hand_landmarks.numpy(), R_cam, t_cam, K, np.zeros(5))
    joints_uv = np.squeeze(joints_uv, axis=1)
    # Retrieve the image data for a given timestamp    
    image_data = device_data_provider.get_image(timestamp_ns, stream_id)
    cv2.rectangle(image_data, (int(bbox.left), int(bbox.top)), (int(bbox.right), int(bbox.bottom)), (0, 255, 0), 1   )
    for point in joints_uv:
        x, y = point
        cv2.circle(image_data, (int(x),int(y)), radius=1, color=(255, 255, 255), thickness=-1) 
    image_data = cv2.cvtColor(image_data.astype('uint8'), cv2.COLOR_BGR2RGB)

    cv2.imshow('img', image_data)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
SeaOtocinclus commented 1 month ago

You were very close and only the camera projection was not handled correctly (using the project function on the device calibration is required to handle correctly the fish eye distortion).

The following will be added soon to the Notebook tutorial

%matplotlib inline
from matplotlib import pyplot as plt

from typing import Any, Optional

import numpy as np
from data_loaders.HeadsetPose3dProvider import HeadsetPose3dProvider

from data_loaders.loader_hand_poses import Handedness, HandPose3dCollection

# Todo move up later
%matplotlib inline
from matplotlib import pyplot as plt
from projectaria_tools.core.calibration import CameraCalibration
from projectaria_tools.core.sophus import SE3
from projectaria_tools.core.stream_id import StreamId

image_streamid = StreamId("214-1")
# image_streamid = StreamId("1201-1")

# timestamp_ns = timestamps[len(timestamps) // 2]
timestamp_ns = timestamps[420]

# Retrieve the image stream label as string
image_stream_label = device_data_provider.get_image_stream_label(image_streamid)

# Retrieve the image data for a given timestamp
image_data = device_data_provider.get_image(timestamp_ns, image_streamid)

# Retrieve the hand vertices and project them on the image at this timestamp
def retrieve_hand_data(timestamp_ns: int) -> Optional[HandPose3dCollection]:
    """
    Retrieve the collection of Hand Pose at this timestamp (i.e. LEFT or RIGHT hand)
    Note: They are 3D pose in world, and does not say if they are visible for a given camera or not (stream_id)
    Visibility can either being determined by using camera visibility (are vertices visible), or using the 2d hands bounding box
    """
    hand_poses_with_dt = None
    if hand_data_provider is not None:
        hand_poses_with_dt = hand_data_provider.get_pose_at_timestamp(
            timestamp_ns=timestamp_ns,
            time_query_options=TimeQueryOptions.CLOSEST,
            time_domain=TimeDomain.TIME_CODE,
        )

        if hand_poses_with_dt is not None:
            return hand_poses_with_dt.pose3d_collection
    return None

def retrieve_device_pose(
    timestamp_ns: int,
    stream_id: StreamId,
    device_pose_provider: Optional[HeadsetPose3dProvider] = None,
    device_data_provider: Optional[Any] = None,
) -> Optional[tuple[SE3, CameraCalibration]]:
    """
    Retrieve the pose of the device and apply the device_camera transformation on top of it for the provided stream_id
    """
    headset_pose3d_with_dt = None
    if device_pose_provider is not None:
        headset_pose3d_with_dt = device_pose_provider.get_pose_at_timestamp(
            timestamp_ns=timestamp_ns,
            time_query_options=TimeQueryOptions.CLOSEST,
            time_domain=TimeDomain.TIME_CODE,
        )

        if headset_pose3d_with_dt is not None:
            headset_pose3d = headset_pose3d_with_dt.pose3d

            # Retrieve the camera calibration (intrinsics and extrinsics) for a given stream_id
            [extrinsics, intrinsics] = device_data_provider.get_camera_calibration(
                stream_id
            )
            # The pose of the given camera at this timestamp is (world_camera = world_device @ device_camera):
            world_camera_pose = headset_pose3d.T_world_device @ extrinsics
            return [world_camera_pose, intrinsics]
    return None

# Retrieve the data for this timestamp
hand_data = retrieve_hand_data(timestamp_ns)
device_pose = retrieve_device_pose(
    timestamp_ns, image_streamid, device_pose_provider, device_data_provider
)

if hand_data is not None and device_pose is not None:

    device_pose_extrinsic = device_pose[0]
    device_pose_intrinsic = device_pose[1]

    # Visualize the image
    plt.imshow(image_data, interpolation="nearest")

    # For each possible hand pose (Left or Right)
    # Project the vertices in the camera and plot the visible one
    for hand_pose_data in hand_data.poses.values():
        # Retrieve the hand vertices and project them on the image at this timestamp
        # hand_mesh_vertices = hand_data_provider.get_hand_mesh_vertices(
        #     hand_pose_data
        # ).tolist()

        # Use Landmarks
        hand_landmarks = hand_data_provider.get_hand_landmarks(hand_pose_data)
        # convert landmarks to connected lines for display
        hand_mesh_vertices = np.array([])
        for connectivity in LANDMARK_CONNECTIVITY:
            connections = np.array([])
            for it in connectivity:
                if len(connections) == 0:
                    connections = [hand_landmarks[it].numpy()]
                else:
                    connections = np.vstack((connections, hand_landmarks[it].numpy()))
            if len(hand_mesh_vertices) == 0:
                hand_mesh_vertices = connections
            else:
                hand_mesh_vertices = np.vstack((hand_mesh_vertices, connections))

        hand_vertices_in_camera = []
        for vertex_in_world in hand_mesh_vertices:
            vertice_3d_camera_coordinates = (
                device_pose_extrinsic.inverse() @ vertex_in_world
            )
            vertice_2d_camera_coordinates = device_pose_intrinsic.project(
                vertice_3d_camera_coordinates
            )
            if vertice_2d_camera_coordinates is not None:
                hand_vertices_in_camera.append(vertice_2d_camera_coordinates)
        handedness_label = hand_pose_data.handedness_label()
        print(
            f"{handedness_label} hand -> visible vertices: {len(hand_vertices_in_camera)}"
        )

        # Plot the hand vertices
        plt.scatter(
            x=[x[0] for x in hand_vertices_in_camera],
            y=[x[1] for x in hand_vertices_in_camera],
            s=1,
            c="r" if hand_pose_data.handedness == Handedness.Right else "b",
        )

plt.show()
image
ryhara commented 1 month ago

Thank you for adding the new implementation, I'll check it out myself! I'll get back to you if I have any other questions.

EDIT: Can the mesh of the MANO hand model be projected in a similar way?

ryhara commented 1 month ago

I have checked that the implementation works. Thank you so much!

I have also implemented the mesh vertices display on my own. Some parts are omitted. The vertices of the mesh can be displayed in this way.

Is there a way to project onto the surface of the mesh at the same time?

hand_mesh_vertices = hand_data_provider.get_hand_mesh_vertices(hand_pose_data)
vertices_in_camera = []
for vertex_in_world in hand_mesh_vertices:
    vertice_3d_camera_coordinates = (
        device_pose_extrinsic.inverse() @ vertex_in_world
    )
    vertice_2d_camera_coordinates = device_pose_intrinsic.project(
        vertice_3d_camera_coordinates
    )
    if vertice_2d_camera_coordinates is not None:
        vertices_in_camera.append(vertice_2d_camera_coordinates)

hand_points = [[x[0], x[1]] for x in vertices_in_camera]
for point in hand_points:
                cv2.circle(image, (int(point[0]), int(point[1])), 1, (255, 255, 255), -1)
image
SeaOtocinclus commented 1 month ago

The API let you retrieve the faces (aka triangles with index of the vertices) by using:

[hand_triangles, hand_vertex_normals] = hand_data_provider.get_hand_mesh_faces_and_normals(hand_pose_data)
ryhara commented 1 month ago

I was able to draw it like this! If I have any other questions, I will post them in a separate issue. I really appreciate it.

for triangle in triangles:
      triangle = np.array(triangle, np.int32)
      points = vertices_in_camera[triangle]
      points = np.array(points, np.int32)
      cv2.fillConvexPoly(mask_image, points, (255, 255, 255))

image

SeaOtocinclus commented 1 month ago

Great, thank you for posting your code snippet and a preview image. I'm closing the issue.