facebookresearch / habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.
https://aihabitat.org/
MIT License
2.66k stars 426 forks source link

Get image observation from specified 3D camera locations and orientations #1608

Closed leobxpan closed 2 years ago

leobxpan commented 2 years ago

Habitat-Sim version

v0.2.1

Hi there,

My use case is that given an environment and a set of pre-computed 3D locations & orientations, extract the image observations by placing the camera in those 3D locations & orientations. I was looking at the ImageExtractor class and by customizing a PoseExtractor, I can extract observations from specified 2D locations and orientations. Is there a way to do so with 3D locations though? Or is ImageExtractor not the best place to look at?

Thank you in advance.

aclegg3 commented 2 years ago

Hey @leobxpan,

Sounds like you may be able to do this with a simple script and no need for the ImageExtractor.

Once you've created a sensor and loaded a scene in Habitat, you can simply set the position/orientation of the agent or sensor SceneNode directly and then render observations for each pose in your pre-computed set.

For example, the Advanced Topic : Motion Tracking Camera snippet in the ECCV Advanced Features tutorial defines a sensor coincident with the agent's SceneNode and sets the state of the agent from a camera look_at function before rendering each frame. You could set from your pre-computed poses instead.

We also provide some quality of life utilities for converting various types of visual sensor observation into other formats such as observation_to_image which returns a Pillow Image object.

leobxpan commented 2 years ago

Thank you @aclegg3 for your reply!

I have one more question. Does habitat use the same coordinate system as the original datasets? I'm currently using matterport3d and it seems setting the agent to my pre-computed locations / orientations does not return expected observations.

leobxpan commented 2 years ago

Specifically, I wrote the following very simple script to get observations by letting the agent follow a pre-computed path. Here trans and root_ori are pre-computed translations and rotations in the original mp3d coordinate frame (and have been verified) that I want the camera to be placed at. I also noticed in this issue that seems matterport was applied some rotations so that's where the additional quat_habitat_to_mp3d comes from. However, the observations I get still seem completely off. Is there anything I'm missing here? Thank you in advance.

quat_habitat_to_mp3d = quat_from_two_vectors(geo.GRAVITY, np.array([0, 0, -1]))
rot_mat_habitat_to_mp3d = quaternion.as_rotation_matrix(quat_habitat_to_mp3d)
rot_mat_mp3d_to_habitat = rot_mat_habitat_to_mp3d.T

camera_pos = trans
camera_rot = root_ori
camera_pos_habitat = camera_pos @ rot_mat_mp3d_to_habitat
camera_rot_habitat = camera_rot @ rot_mat_mp3d_to_habitat

sim.get_agent(0).scene_node.transformation = mn.Matrix4.from_(camera_rot_habitat, camera_pos_habitat)
observation = habitat_sim.utils.viz_utils.observation_to_image(sim.get_sensor_observations()['color_sensor_1st_person'], observation_type='color')
aclegg3 commented 2 years ago

Habitat coordinate system is Y up, -Z forward. As you said, you'll need to apply a corrective transform from your original coordinate space. For reference, here is the way Habitat applies this transform for mp3d by default (FYI this is deprecated to be replaced by config).

Also, looks like you may want to apply the rotations in the opposite order? If camera_pos is in local space (mp3d), then conversion to global space (mp3d to hab) should be on the left. camera_pos_habitat = rot_mat_mp3d_to_habitat @ camera_pos

leobxpan commented 2 years ago

Thanks for your reply. Yes that was a mistake. I corrected that but the agent still seems to be placed in a different room (I got the observation below for house zsNo4HB9uLZ while it's supposed to be in the bedroom). I've verified the pre-computed positions and orientations though. image

leobxpan commented 2 years ago

Also I'm a bit confused about these two places in the Motion Tracking Camera part of the tutorial:

# set the color sensor transform to be the agent transform
visual_sensor._spec.position = np.array([0, 0, 0])
visual_sensor._spec.orientation = np.array([0, 0, 0])
visual_sensor._sensor_object.set_transformation_from_spec()

In my understanding this should be unnecessary if I'm directly setting the agent's location / orientation later, but it seems adding this piece of code makes a difference to the observations I get.

Another thing is why do we need this "boosting off" from the floor?

# boost the agent off the floor
sim.get_agent(0).scene_node.translation += np.array([0, 1.5, 0])

Thank you again for your help.

aclegg3 commented 2 years ago

In my understanding this should be unnecessary if I'm directly setting the agent's location / orientation later, but it seems adding this piece of code makes a difference to the observations I get.

Another thing is why do we need this "boosting off" from the floor?

These are coupled. First, the sensor spec defines the relative offset in orientation and position of the camera sensor from the agent. By default, the agent body node is attached to the ground via NavMesh and the sensor is offset from the agent by 1.5 meters in Y. If you rotate the agent only around Y, then this is a fine representation. However, in this demo we want to rotate the agent around multiple axis, so I setup the camera to be coincident with the agent and then directly manipulate the agent's scene node. If you didn't then boost the agent off the floor, you'd get observations from the ground height.

You could, alternatively, have directly controlled the camera sensor node, but then you'd need to do so for each sensor attached to the agent (RGB, depth, semantics, etc...) which would be more trouble than just setting up the spec to be concident.

In your case you have pre-computed position and orientation for your camera, so I suggest you do something similar and setup the sensors to be coincident with the agent. Then set the agent state directly to your pre-computed states.

aclegg3 commented 2 years ago

Thanks for your reply. Yes that was a mistake. I corrected that but the agent still seems to be placed in a different room (I got the observation below for house zsNo4HB9uLZ while it's supposed to be in the bedroom).

This looks like a bedroom to me. Should it be another bedroom or maybe it's just the sensor offset which is the problem now?

leobxpan commented 2 years ago

I see. That makes a lot of sense. Right now I'm getting an observation like this. image

Yes, the annotated bedroom in this house is a different room. As shown below, the bottom room is the actual bedroom but the current observation shows the agent is in the room above. From the observation it seems both the translation and orientation is off (orientation should be roughly -Z but it's currently +Z). Screen Shot 2022-01-11 at 10 40 58 AM

I'm attaching the current copy of my code below. Do you see anything else missing / incorrect? Thanks a lot.

visual_sensor._spec.position = np.array([0, 0, 0])
visual_sensor._spec.orientation = np.array([0, 0, 0])
visual_sensor._sensor_object.set_transformation_from_spec()

quat_habitat_to_mp3d = quat_from_two_vectors(geo.GRAVITY, np.array([0, 0, -1]))
rot_mat_habitat_to_mp3d = quaternion.as_rotation_matrix(quat_habitat_to_mp3d)
rot_mat_mp3d_to_habitat = rot_mat_habitat_to_mp3d.T

camera_pos = trans
camera_rot = root_ori
camera_pos_habitat = rot_mat_mp3d_to_habitat @ camera_pos
camera_rot_habitat = rot_mat_mp3d_to_habitat @ camera_rot

sim.get_agent(0).scene_node.translation = camera_pos_habitat
sim.get_agent(0).scene_node.rotation = mn.Quaternion.from_matrix(camera_rot_habitat)

observation = habitat_sim.utils.viz_utils.observation_to_image(sim.get_sensor_observations()['color_sensor_1st_person'], observation_type='color')
leobxpan commented 2 years ago

After fixing a bug in my data loading part, the translation seems correct but the rotation seems to be rotated for an extra 90 degrees. the camera points towards the floor instead of forward in the image below image

leobxpan commented 2 years ago

Since this involves a new problem I'm opening a new issue on this (https://github.com/facebookresearch/habitat-sim/issues/1620).