facebookresearch / habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.
https://aihabitat.org/
MIT License
2.49k stars 407 forks source link

How to generate ground truth color and depth images from Replica dataset for 3D reconstruction? #1732

Closed sanbuddhacharyas closed 2 years ago

sanbuddhacharyas commented 2 years ago

I followed the tutorial from https://github.com/facebookresearch/habitat-sim/blob/main/examples/tutorials/colabs/ECCV_2020_Navigation.ipynb and used "observations = sim.step("turn_left")" to find the depth and color image of the 3D scene but, the output images are deformed and there is radial and tangential distortion in the images and when I reconstruct 3d model using open3d, the output is very bad. How can I get the real ground truth color and depth images so that I can reconstruct a 3D model from the captured data.

agent and simulator configuration

simulator backend

sim_cfg = habitat_sim.SimulatorConfiguration()
sim_cfg.scene_id = settings["scene"]

# In the 1st example, we attach only one sensor,
# a RGB visual sensor, to the agent
rgb_sensor_spec = habitat_sim.CameraSensorSpec()
rgb_sensor_spec.uuid = "color_sensor"
rgb_sensor_spec.sensor_type = habitat_sim.SensorType.COLOR
rgb_sensor_spec.resolution = [settings["height"], settings["width"]]
rgb_sensor_spec.position = [0.0, settings["sensor_height"], 0.0]
rgb_sensor_spec.sensor_subtype = habitat_sim.SensorSubType.PINHOLE

depth_sensor_spec = habitat_sim.CameraSensorSpec()
depth_sensor_spec.uuid = "depth_sensor"
depth_sensor_spec.sensor_type = habitat_sim.SensorType.DEPTH
depth_sensor_spec.resolution = [settings['height'], settings['width']]
depth_sensor_spec.position   = [0.0, settings['sensor_height'], 0.0]
depth_sensor_spec.sensor_subtype = habitat_sim.SensorSubType.PINHOLE

# agent
agent_cfg = habitat_sim.agent.AgentConfiguration()
agent_cfg.sensor_specifications = [rgb_sensor_spec, depth_sensor_spec]
agent_cfg.action_space = {
    "move_forward" : habitat_sim.agent.ActionSpec("move_forward", habitat_sim.agent.ActuationSpec(amount=0.1)),
    "turn_left"    : habitat_sim.agent.ActionSpec("turn_left", habitat_sim.agent.ActuationSpec(amount=1.0)),
    "turn_right"   : habitat_sim.agent.ActionSpec("turn_right", habitat_sim.agent.ActuationSpec(amount=1.0))
}
erikwijmans commented 2 years ago

Can you give an example of the radial and tangential distortion you are seeing? The depth camera and rgb camera in habitat-sim are idealized pinhole cameras. We don't have anything implemented that could add such distortion so that shouldn't even be possible.

My guess is that there's a difference between the camera intrinsics/extrinsics between habitat-sim and open3d that is causing this.

sanbuddhacharyas commented 2 years ago

@erikwijmans When I perform translation everything is fine 3d reconstruction is also perfect but when I rotate the agent then the environment is elongated and therefore my whole 3d reconstruction result overlapped with each other.

https://user-images.githubusercontent.com/33871656/165308271-b639cb74-881d-4ea0-9667-a6b712d0af2f.mp4 https://user-images.githubusercontent.com/33871656/165306780-2869572e-422e-4529-a6a8-27dbb3355236.mp4

You can see in the video, that at the beginning, I only used translation and everything was fine but when I rotate the camera everything is overlapped and 3D reconstruction doesn't work well. When I translate the camera in (X, Y, Z) direction there is no problem

erikwijmans commented 2 years ago

That's fine. That isn't radial or tangential distortion (that you'd see even with the flat camera). That's just how things look under a perspective projection/pinhole camera when you tilt it. Open3D should be able to handle that completely fine as you get the same effect with a real camera.

Are you giving the rgb-d pairs and camera parameters to Open3D or are you doing the un-projection from camera screen coordinates yourself?

sanbuddhacharyas commented 2 years ago

That's fine. That isn't radial or tangential distortion (that you'd see even with the flat camera). That's just how things look under a perspective projection/pinhole camera when you tilt it. Open3D should be able to handle that completely fine as you get the same effect with a real camera.

Are you giving the rgb-d pairs and camera parameters to Open3D or are you doing the un-projection from camera screen coordinates yourself?

@erikwijmans I think the problem is due to intrinsic parameters. How can I get the intrinsic parameter of the color and depth camera sensor? are these the intrinsic parameters used by the sensors in habitat-sim K = np.array([ [1 / np.tan(hfov / 2.), 0., 0., 0.], [0., 1 / np.tan(hfov / 2.), 0., 0.], [0., 0., 1, 0], [0., 0., 0, 1]])

if the width = 640, height = 480 focal length = 1 / np.tan(90 / 2 ) = 0.6173696237835551 fx = 395.11655922147526, fy = 296.33741941610646 what about camera center? Is it cx = 640 / 2 = 320 and cy = 480 / 2 = 240

But the relationship between focal length and field of view is image

Or how can we set the intrinsic parameters for the pinhole camera?

erikwijmans commented 2 years ago

np.tan takes radians, not degrees. So it should be 1/np.tan(pi/4).

sanbuddhacharyas commented 2 years ago

np.tan takes radians, not degrees. So it should be 1/np.tan(pi/4).

Thank you so much for your answer. I will try with correct intrinsic parameter.

oskar0812 commented 1 year ago

np.tan takes radians, not degrees. So it should be 1/np.tan(pi/4).

Thank you so much for your answer. I will try with correct intrinsic parameter.

Have you resolved that question? I still have the problem that the output camera Intrinsic parameters and extrinsic parameters cannot match the picture.And i also used Replica Dataset.I want to use this data to do some novel viewpoint synthesis tasks. I think I need to ask you for some help, such as how you render the Replica image while outputting its camera intrinsic and extrinsic parameters.I would be grateful if you could provide some help.

ariannaliu commented 1 year ago

Hi have you solve this? I met the same proble.

np.tan takes radians, not degrees. So it should be 1/np.tan(pi/4).

Thank you so much for your answer. I will try with correct intrinsic parameter.

Have you resolved that question? I still have the problem that the output camera Intrinsic parameters and extrinsic parameters cannot match the picture.And i also used Replica Dataset.I want to use this data to do some novel viewpoint synthesis tasks. I think I need to ask you for some help, such as how you render the Replica image while outputting its camera intrinsic and extrinsic parameters.I would be grateful if you could provide some help.

oskar0812 commented 1 year ago

Hi have you solve this? I met the same proble.

np.tan takes radians, not degrees. So it should be 1/np.tan(pi/4).

Thank you so much for your answer. I will try with correct intrinsic parameter.

Have you resolved that question? I still have the problem that the output camera Intrinsic parameters and extrinsic parameters cannot match the picture.And i also used Replica Dataset.I want to use this data to do some novel viewpoint synthesis tasks. I think I need to ask you for some help, such as how you render the Replica image while outputting its camera intrinsic and extrinsic parameters.I would be grateful if you could provide some help.

Not yet, can you tell me what kind of problem you encountered?

zhuisa commented 1 year ago

@oskar0812 Hi have you solve this? I met the same proble.

oskar0812 commented 1 year ago

@oskar0812 Hi have you solve this? I met the same proble. When you use sim.get_agent to get the agent pose, you can try to replace the qy in the quaternion with (-1)qy, and the y in the displacement with (-1)y.These are what ariannaliu told me.