facebookresearch / habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.
https://aihabitat.org/
MIT License
2.68k stars 427 forks source link

Obtain Intrinsic Matrix #2439

Open 21yrm opened 3 months ago

21yrm commented 3 months ago

I use Habitat-sim to obtain RGBD image. And I want to obtain 3d xyz coordinates from the image. My image size is 1920 x 1080. I am using a PINHOLE camera.

Q1: 1/focal_length = 2/W * tan(hfov/2), is focal_length in my setting 960? Q2: focal_length_x is equal to focal_length_y? Q3: What is the value of cx and cy? Normally, cx = width/2 but i saw in some tutorial, cx and cy are both 0 due to OpenGL, quite confused Q4: What is the intrinsic matrix? Is it [960, 0., 960], [0., 960, 540], [0., 0., 1]

Thank you very much!

21yrm commented 3 months ago

Here are my configs:


import habitat_sim

rgb_sensor = True
depth_sensor = True
semantic_sensor = True

sim_settings = {
"width": 1920,
"height": 1080,
"scene": "8WUmhLawc2A.glb", 
"default_agent": 0,
"sensor_height": 1.5,
"color_sensor": rgb_sensor,
"semantic_sensor": semantic_sensor,
"depth_sensor": depth_sensor,
"seed": 1,
"enable_physics": False,
}

def make_cfg(settings):
sim_cfg = habitat_sim.SimulatorConfiguration()
sim_cfg.gpu_device_id = 0
sim_cfg.scene_id = settings["scene"]
sim_cfg.enable_physics = settings["enable_physics"]

sensor_specs = []

if settings["color_sensor"]:
    color_sensor_spec = habitat_sim.CameraSensorSpec()
    color_sensor_spec.uuid = "color_sensor"
    color_sensor_spec.sensor_type = habitat_sim.SensorType.COLOR
    color_sensor_spec.resolution = [settings["height"], settings["width"]]
    color_sensor_spec.position = [0.0, settings["sensor_height"], 0.0]
    color_sensor_spec.sensor_subtype = habitat_sim.SensorSubType.PINHOLE
    sensor_specs.append(color_sensor_spec)

if settings["depth_sensor"]:
    depth_sensor_spec = habitat_sim.CameraSensorSpec()
    depth_sensor_spec.uuid = "depth_sensor"
    depth_sensor_spec.sensor_type = habitat_sim.SensorType.DEPTH
    depth_sensor_spec.resolution = [settings["height"], settings["width"]]
    depth_sensor_spec.position = [0.0, settings["sensor_height"], 0.0]
    depth_sensor_spec.sensor_subtype = habitat_sim.SensorSubType.PINHOLE
    sensor_specs.append(depth_sensor_spec)

if settings["semantic_sensor"]:
    semantic_sensor_spec = habitat_sim.CameraSensorSpec()
    semantic_sensor_spec.uuid = "semantic_sensor"
    semantic_sensor_spec.sensor_type = habitat_sim.SensorType.SEMANTIC
    semantic_sensor_spec.resolution = [settings["height"], settings["width"]]
    semantic_sensor_spec.position = [0.0, settings["sensor_height"], 0.0]
    semantic_sensor_spec.sensor_subtype = habitat_sim.SensorSubType.PINHOLE
    sensor_specs.append(semantic_sensor_spec)

agent_cfg = habitat_sim.agent.AgentConfiguration()
agent_cfg.sensor_specifications = sensor_specs
agent_cfg.action_space = {
    "move_forward": habitat_sim.agent.ActionSpec(
        "move_forward", habitat_sim.agent.ActuationSpec(amount=0.25)
    ),
    "turn_left": habitat_sim.agent.ActionSpec(
        "turn_left", habitat_sim.agent.ActuationSpec(amount=30.0)
    ),
    "turn_right": habitat_sim.agent.ActionSpec(
        "turn_right", habitat_sim.agent.ActuationSpec(amount=30.0)
    ),
}

return habitat_sim.Configuration(sim_cfg, [agent_cfg])

cfg = make_cfg(sim_settings)
aclegg3 commented 3 months ago

Hey @21yrm,

If you are attempting to unproject a pinhole camera (i.e., to acquire the 3D ray which corresponds to a 2D point on the viewport) you can use this function: https://aihabitat.org/docs/habitat-sim/habitat_sim.gfx.Camera.html#unproject

You can also reverse engineer the implementation if you track this through the source code to understand how it came about.

There is also a function to do the opposite (manual 3D -> 2D projection).

Eku127 commented 1 month ago

Based on the manual from 3D->2D projection given by @aclegg3 , intrinsic matrix can be calculated by the render matrix and viewport.

def get_camera_intrinsics(sim, sensor_name):
    # Get render camera
    render_camera = sim._sensors[sensor_name]._sensor_object.render_camera

    # Get projection matrix
    projection_matrix = render_camera.projection_matrix

    # Get resolution
    viewport_size = render_camera.viewport

    # Intrinsic calculation
    fx = projection_matrix[0, 0] * viewport_size[0] / 2.0
    fy = projection_matrix[1, 1] * viewport_size[1] / 2.0
    cx = (projection_matrix[2, 0] + 1.0) * viewport_size[0] / 2.0
    cy = (projection_matrix[2, 1] + 1.0) * viewport_size[1] / 2.0

    intrinsics = np.array([
        [fx, 0, cx],
        [0, fy, cy],
        [0,  0,  1]
    ])
    return intrinsics