Closed ryhara closed 1 month ago
Hi @ryhara , thank you for your issue and comments.
If you are using the HOT3D sequences:
You can project the mesh vertices and use the project function on the camera. We dont have an example yet but I can try to add one next week.
Here what you should do in pseudo code:
I will extend the demo notebook with a camera projection part https://github.com/facebookresearch/hot3d/blob/main/hot3d/HOT3D_Tutorial.ipynb
If you are using the HOT3D clips:
thank you for reply!
I use HOT3D sequences, so I'm looking forward to seeing the examples added!
The method for projecting 3d joints into 2d is the same as for meshes, right?
I'm really grateful for the support.
Hi @SeaOtocinclus, I wrote a small snippet for 2D conversion of the points but i somehow cant get the right annotations. Adding the snippet below EDIT: just doing it for the right hand right now. just for the landmarks. not the entire mesh (since its easier to visualize) EDIT 2: added bounding box to help visualize better
for timestamp_ns in tqdm(timestamps[50:]):
box2d_collection_with_dt = (
hand_box2d_data_provider.get_bbox_at_timestamp(
stream_id=stream_id,
timestamp_ns=timestamp_ns,
time_query_options=TimeQueryOptions.CLOSEST,
time_domain=TimeDomain.TIME_CODE,
)
)
if box2d_collection_with_dt is None:
continue
if (
box2d_collection_with_dt is None
and box2d_collection_with_dt.box2d_collection or None
):
continue
try:
axis_aligned_box2d = box2d_collection_with_dt.box2d_collection.box2ds[RIGHT_HAND_INDEX]
except KeyError:
continue
bbox = axis_aligned_box2d.box2d
if bbox is None:
continue
# handpose
hand_poses_with_dt = hand_data_provider.get_pose_at_timestamp(
timestamp_ns=timestamp_ns,
time_query_options=TimeQueryOptions.CLOSEST,
time_domain=TimeDomain.TIME_CODE,
)
if hand_poses_with_dt is None:
continue
hand_pose_collection = hand_poses_with_dt.pose3d_collection
for hand_pose_data in hand_pose_collection.poses.values():
if hand_pose_data.is_left_hand():
continue
elif hand_pose_data.is_right_hand():
hand_landmarks = hand_data_provider.get_hand_landmarks(
hand_pose_data
)
# camera pose
headset_pose3d_with_dt = device_pose_provider.get_pose_at_timestamp(
timestamp_ns=timestamp_ns,
time_query_options=TimeQueryOptions.CLOSEST,
time_domain=TimeDomain.TIME_CODE,
)
if headset_pose3d_with_dt is None:
continue
headset_pose3d = headset_pose3d_with_dt.pose3d
Rt = headset_pose3d.T_world_device @ extrinsics.__copy__()
Rt = Rt.to_matrix3x4()
R_cam, t_cam = Rt[:, :3], Rt[:, 3:]
joints_uv, _ = cv2.projectPoints(hand_landmarks.numpy(), R_cam, t_cam, K, np.zeros(5))
joints_uv = np.squeeze(joints_uv, axis=1)
# Retrieve the image data for a given timestamp
image_data = device_data_provider.get_image(timestamp_ns, stream_id)
cv2.rectangle(image_data, (int(bbox.left), int(bbox.top)), (int(bbox.right), int(bbox.bottom)), (0, 255, 0), 1 )
for point in joints_uv:
x, y = point
cv2.circle(image_data, (int(x),int(y)), radius=1, color=(255, 255, 255), thickness=-1)
image_data = cv2.cvtColor(image_data.astype('uint8'), cv2.COLOR_BGR2RGB)
cv2.imshow('img', image_data)
cv2.waitKey(0)
cv2.destroyAllWindows()
You were very close and only the camera projection was not handled correctly (using the project
function on the device calibration is required to handle correctly the fish eye distortion).
The following will be added soon to the Notebook tutorial
%matplotlib inline
from matplotlib import pyplot as plt
from typing import Any, Optional
import numpy as np
from data_loaders.HeadsetPose3dProvider import HeadsetPose3dProvider
from data_loaders.loader_hand_poses import Handedness, HandPose3dCollection
# Todo move up later
%matplotlib inline
from matplotlib import pyplot as plt
from projectaria_tools.core.calibration import CameraCalibration
from projectaria_tools.core.sophus import SE3
from projectaria_tools.core.stream_id import StreamId
image_streamid = StreamId("214-1")
# image_streamid = StreamId("1201-1")
# timestamp_ns = timestamps[len(timestamps) // 2]
timestamp_ns = timestamps[420]
# Retrieve the image stream label as string
image_stream_label = device_data_provider.get_image_stream_label(image_streamid)
# Retrieve the image data for a given timestamp
image_data = device_data_provider.get_image(timestamp_ns, image_streamid)
# Retrieve the hand vertices and project them on the image at this timestamp
def retrieve_hand_data(timestamp_ns: int) -> Optional[HandPose3dCollection]:
"""
Retrieve the collection of Hand Pose at this timestamp (i.e. LEFT or RIGHT hand)
Note: They are 3D pose in world, and does not say if they are visible for a given camera or not (stream_id)
Visibility can either being determined by using camera visibility (are vertices visible), or using the 2d hands bounding box
"""
hand_poses_with_dt = None
if hand_data_provider is not None:
hand_poses_with_dt = hand_data_provider.get_pose_at_timestamp(
timestamp_ns=timestamp_ns,
time_query_options=TimeQueryOptions.CLOSEST,
time_domain=TimeDomain.TIME_CODE,
)
if hand_poses_with_dt is not None:
return hand_poses_with_dt.pose3d_collection
return None
def retrieve_device_pose(
timestamp_ns: int,
stream_id: StreamId,
device_pose_provider: Optional[HeadsetPose3dProvider] = None,
device_data_provider: Optional[Any] = None,
) -> Optional[tuple[SE3, CameraCalibration]]:
"""
Retrieve the pose of the device and apply the device_camera transformation on top of it for the provided stream_id
"""
headset_pose3d_with_dt = None
if device_pose_provider is not None:
headset_pose3d_with_dt = device_pose_provider.get_pose_at_timestamp(
timestamp_ns=timestamp_ns,
time_query_options=TimeQueryOptions.CLOSEST,
time_domain=TimeDomain.TIME_CODE,
)
if headset_pose3d_with_dt is not None:
headset_pose3d = headset_pose3d_with_dt.pose3d
# Retrieve the camera calibration (intrinsics and extrinsics) for a given stream_id
[extrinsics, intrinsics] = device_data_provider.get_camera_calibration(
stream_id
)
# The pose of the given camera at this timestamp is (world_camera = world_device @ device_camera):
world_camera_pose = headset_pose3d.T_world_device @ extrinsics
return [world_camera_pose, intrinsics]
return None
# Retrieve the data for this timestamp
hand_data = retrieve_hand_data(timestamp_ns)
device_pose = retrieve_device_pose(
timestamp_ns, image_streamid, device_pose_provider, device_data_provider
)
if hand_data is not None and device_pose is not None:
device_pose_extrinsic = device_pose[0]
device_pose_intrinsic = device_pose[1]
# Visualize the image
plt.imshow(image_data, interpolation="nearest")
# For each possible hand pose (Left or Right)
# Project the vertices in the camera and plot the visible one
for hand_pose_data in hand_data.poses.values():
# Retrieve the hand vertices and project them on the image at this timestamp
# hand_mesh_vertices = hand_data_provider.get_hand_mesh_vertices(
# hand_pose_data
# ).tolist()
# Use Landmarks
hand_landmarks = hand_data_provider.get_hand_landmarks(hand_pose_data)
# convert landmarks to connected lines for display
hand_mesh_vertices = np.array([])
for connectivity in LANDMARK_CONNECTIVITY:
connections = np.array([])
for it in connectivity:
if len(connections) == 0:
connections = [hand_landmarks[it].numpy()]
else:
connections = np.vstack((connections, hand_landmarks[it].numpy()))
if len(hand_mesh_vertices) == 0:
hand_mesh_vertices = connections
else:
hand_mesh_vertices = np.vstack((hand_mesh_vertices, connections))
hand_vertices_in_camera = []
for vertex_in_world in hand_mesh_vertices:
vertice_3d_camera_coordinates = (
device_pose_extrinsic.inverse() @ vertex_in_world
)
vertice_2d_camera_coordinates = device_pose_intrinsic.project(
vertice_3d_camera_coordinates
)
if vertice_2d_camera_coordinates is not None:
hand_vertices_in_camera.append(vertice_2d_camera_coordinates)
handedness_label = hand_pose_data.handedness_label()
print(
f"{handedness_label} hand -> visible vertices: {len(hand_vertices_in_camera)}"
)
# Plot the hand vertices
plt.scatter(
x=[x[0] for x in hand_vertices_in_camera],
y=[x[1] for x in hand_vertices_in_camera],
s=1,
c="r" if hand_pose_data.handedness == Handedness.Right else "b",
)
plt.show()
Thank you for adding the new implementation, I'll check it out myself! I'll get back to you if I have any other questions.
EDIT: Can the mesh of the MANO hand model be projected in a similar way?
I have checked that the implementation works. Thank you so much!
I have also implemented the mesh vertices display on my own. Some parts are omitted. The vertices of the mesh can be displayed in this way.
Is there a way to project onto the surface of the mesh at the same time?
hand_mesh_vertices = hand_data_provider.get_hand_mesh_vertices(hand_pose_data)
vertices_in_camera = []
for vertex_in_world in hand_mesh_vertices:
vertice_3d_camera_coordinates = (
device_pose_extrinsic.inverse() @ vertex_in_world
)
vertice_2d_camera_coordinates = device_pose_intrinsic.project(
vertice_3d_camera_coordinates
)
if vertice_2d_camera_coordinates is not None:
vertices_in_camera.append(vertice_2d_camera_coordinates)
hand_points = [[x[0], x[1]] for x in vertices_in_camera]
for point in hand_points:
cv2.circle(image, (int(point[0]), int(point[1])), 1, (255, 255, 255), -1)
The API let you retrieve the faces (aka triangles with index of the vertices) by using:
[hand_triangles, hand_vertex_normals] = hand_data_provider.get_hand_mesh_faces_and_normals(hand_pose_data)
I was able to draw it like this! If I have any other questions, I will post them in a separate issue. I really appreciate it.
for triangle in triangles:
triangle = np.array(triangle, np.int32)
points = vertices_in_camera[triangle]
points = np.array(points, np.int32)
cv2.fillConvexPoly(mask_image, points, (255, 255, 255))
Great, thank you for posting your code snippet and a preview image. I'm closing the issue.
@SeaOtocinclus
Hi, thank you for releasing such a great dataset and library.
Is it possible to project a 3D hand mesh onto a 2D image? I tried searching the code for keywords like "projection," but I couldn't find anything related to it. Is there a function that can project? Could you please provide guidance on how to achieve this projection if possible?