allenai / allenact

An open source framework for research in Embodied-AI from AI2.
https://www.allenact.org
Other
313 stars 50 forks source link

Handed-ness of coordinate system #322

Open sagadre opened 2 years ago

sagadre commented 2 years ago

Problem

The THOR medatada for agent pose and objects seems to live in the native Unity left-handed coordinate system. This might be confusing for people who are not familiar with how Unity works as right-handed coordinate systems are much more common. For example, creating point clouds in world space using allenact.embodiedai.mapping.mapping_utils.point_cloud_utils.depth_frame_to_camera_space_xyz and allenact.embodiedai.mapping.mapping_utils.point_cloud_utils.camera_space_xyz_to_world_xyz and visualizing in meshlab will give the appearance of a flip (as seen below). Ultimately this should be handled on the user side, but it should be clear that THOR is using a left-handed coordinate system. Hence updated THOR docs and docs for allenact.embodiedai.mapping.mapping_utils.point_cloud_utils would be helpful.

Steps to reproduce

from ai2thor.controller import Controller
import trimesh
import torch
from allenact.embodiedai.mapping.mapping_utils.point_cloud_utils import \
    depth_frame_to_camera_space_xyz, camera_space_xyz_to_world_xyz
from PIL import Image

controller = Controller(
    renderDepthImage=True,
    renderInstanceSegmentation=True,
    width=672,
    height=672,
    visibilityDistance=20.0,
    fieldOfView=90,
    agentMode='locobot',
    rotateStepDegrees=30
)
event = controller.step(action="Done")

camera_space_xyz = depth_frame_to_camera_space_xyz(
    depth_frame=torch.as_tensor(event.depth_frame),
    mask=None,
    fov=90
)
x = event.metadata['agent']['position']['x']
y = event.metadata['agent']['position']['y']
z = event.metadata['agent']['position']['z']
world_points = camera_space_xyz_to_world_xyz(
    camera_space_xyzs=camera_space_xyz,
    camera_world_xyz=torch.as_tensor([x, y, z]),
    rotation=event.metadata['agent']['rotation']['y'],
    horizon=event.metadata['agent']['cameraHorizon'],
)

world_points = torch.transpose(world_points, 0, 1)
rgba_colors = torch.ones(world_points.shape[0], 4)
rgba_colors[:, :3] = 0.
ply = trimesh.points.PointCloud(vertices=world_points.numpy(), colors=rgba_colors.numpy())
ply.export("dbg.ply")
Image.fromarray(event.frame).save('dbg.png')

Screenshots

Image from THOR: 0 (1)

Back-projected visualization in meshlab: Screen Shot 2021-12-10 at 4 20 30 PM