allenai / ai2thor

An open-source platform for Visual AI.
http://ai2thor.allenai.org
Apache License 2.0
1.13k stars 215 forks source link

How can I get the intrinsic matrix of the agent camera #733

Open xiaobaishu0097 opened 3 years ago

xiaobaishu0097 commented 3 years ago

Hi,

This is probably a very amateur question but is it possible to get the intrinsic matrix of the agent camera OR can I obtain point cloud from RGB-D image captured from ai2thor? I tried https://github.com/allenai/ai2thor/issues/110#issue-412514905, but it seems to have some compiler errors.

Thanks

ruinianxu commented 3 years ago

@xiaobaishu0097 Hi,

I am meeting the same problem as you. I wonder if you find the way to address it. Really appreciate your help.

Thanks

zkytony commented 3 years ago

Just in case anyone runs into this issue, here is my two cents. I found that it is sufficient to compute the camera intrinsic using just the field of view, which is exposed by the ai2thor API and configurable.

I was originally following #275 and #110, and tried to print out the projection matrix in unity, but it led to nowhere. If you have a method to get camera intrinsic matrix directly from unity, please share.

Below is some code for plotting point cloud given a color and depth image, using Open3D. Ai2thor version is 3.3.4.


import math
import numpy as np
import open3d as o3d
from ai2thor.controller import Controller

def to_rad(th):
    return th*math.pi / 180

def example():
    width, height = 600, 600
    fov = 90
    controller = Controller(scene="FloorPlan1",
                            width=width,
                            height=height,
                            fieldOfView=fov,
                            renderDepthImage=True)
    controller.step(action="RotateLeft", degrees=45)
    event = controller.step(action="Pass")

    # Convert fov to focal length
    focal_length = 0.5 * width * math.tan(to_rad(fov/2))

    # camera intrinsics
    fx, fy, cx, cy = (focal_length, focal_length, width/2, height/2)

    # Obtain point cloud
    color = o3d.geometry.Image(event.frame.astype(np.uint8))
    d = event.depth_frame
    d /= np.max(d)
    depth = o3d.geometry.Image(d)
    rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(color, depth,
                                                              depth_scale=1.0,
                                                              depth_trunc=0.7,
                                                              convert_rgb_to_intensity=False)
    intrinsic = o3d.camera.PinholeCameraIntrinsic(width, height, fx, fy, cx, cy)
    pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd, intrinsic)
    pcd.transform([[1, 0, 0, 0],
                   [0, -1, 0, 0],
                   [0, 0, -1, 0],
                   [0, 0, 0, 1]])
    o3d.visualization.draw_geometries([pcd])

if __name__ == "__main__":
    example()

Output looks like: Selection_025

I haven't tested, but it is promising, based on the quality of this point cloud, you can obtain 3D coordinates in the world frame from 2D pixels.

ruinianxu commented 3 years ago

I successfully modified the source code and let the agent print projection matrix of the camera. I've documented everything in my forked version. Under the test folder, there is a readme file documents everything from installing unity in ubuntu to compling and building the source code. Hope it can help someone who need to get the projection matrix of camera.

GuardianWang commented 2 years ago
focal_length = 0.5 * width * math.tan(to_rad(fov/2))

Should it be

focal_length = 0.5 * width / math.tan(to_rad(fov/2))
LucBourrat1 commented 1 year ago

I agree with @GuardianWang. In this case it works because math.tan(to_rad(fov/2)) ~ 1 so multiplying or dividing gives the same result, but the right formula is with dividing

ChongjianGE commented 1 year ago

hi @ruinianxu, it seems your code should run based on a local unity hub. is that right?

ruinianxu commented 1 year ago

@ChongjianGE Yeah.

ChongjianGE commented 1 year ago

@ruinianxu Hi, ruinian,

Thank you for your assistance. I have successfully implemented the retrieval of the projection matrix in Unity Editor version 2021.03. However, I have noticed that regardless of how I move the agent/camera, the 4x4 projection matrix I obtain remains unchanged. This seems abnormal to me based on my understanding. Additionally, I would like to ask if you have any recommended methods for conveniently verifying whether the projection matrix is retrieved correctly.

lvjihui commented 1 year ago

i know event.metadata['agent'] can give me the position of the agent,but how can i get the relationship between the camera coordinate system and the agent coordinate system.

Penguin963 commented 1 month ago

Just in case anyone runs into this issue, here is my two cents. I found that it is sufficient to compute the camera intrinsic using just the field of view, which is exposed by the ai2thor API and configurable.

I was originally following #275 and #110, and tried to print out the projection matrix in unity, but it led to nowhere. If you have a method to get camera intrinsic matrix directly from unity, please share.

Below is some code for plotting point cloud given a color and depth image, using Open3D. Ai2thor version is 3.3.4.

import math
import numpy as np
import open3d as o3d
from ai2thor.controller import Controller

def to_rad(th):
    return th*math.pi / 180

def example():
    width, height = 600, 600
    fov = 90
    controller = Controller(scene="FloorPlan1",
                            width=width,
                            height=height,
                            fieldOfView=fov,
                            renderDepthImage=True)
    controller.step(action="RotateLeft", degrees=45)
    event = controller.step(action="Pass")

    # Convert fov to focal length
    focal_length = 0.5 * width * math.tan(to_rad(fov/2))

    # camera intrinsics
    fx, fy, cx, cy = (focal_length, focal_length, width/2, height/2)

    # Obtain point cloud
    color = o3d.geometry.Image(event.frame.astype(np.uint8))
    d = event.depth_frame
    d /= np.max(d)
    depth = o3d.geometry.Image(d)
    rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(color, depth,
                                                              depth_scale=1.0,
                                                              depth_trunc=0.7,
                                                              convert_rgb_to_intensity=False)
    intrinsic = o3d.camera.PinholeCameraIntrinsic(width, height, fx, fy, cx, cy)
    pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd, intrinsic)
    pcd.transform([[1, 0, 0, 0],
                   [0, -1, 0, 0],
                   [0, 0, -1, 0],
                   [0, 0, 0, 1]])
    o3d.visualization.draw_geometries([pcd])

if __name__ == "__main__":
    example()

Output looks like: Selection_025

I haven't tested, but it is promising, based on the quality of this point cloud, you can obtain 3D coordinates in the world frame from 2D pixels.

hi, zkytony I want to know what the width and height are. Are their the player_screen_height and player_screen_width or the width and height of the image I get from the agent?