Closed leobxpan closed 2 years ago
Hey @leobxpan,
Sounds like you may be able to do this with a simple script and no need for the ImageExtractor.
Once you've created a sensor and loaded a scene in Habitat, you can simply set the position/orientation of the agent or sensor SceneNode directly and then render observations for each pose in your pre-computed set.
For example, the Advanced Topic : Motion Tracking Camera
snippet in the ECCV Advanced Features tutorial
defines a sensor coincident with the agent's SceneNode and sets the state of the agent from a camera look_at
function before rendering each frame. You could set from your pre-computed poses instead.
We also provide some quality of life utilities for converting various types of visual sensor observation into other formats such as observation_to_image
which returns a Pillow Image
object.
Thank you @aclegg3 for your reply!
I have one more question. Does habitat use the same coordinate system as the original datasets? I'm currently using matterport3d and it seems setting the agent to my pre-computed locations / orientations does not return expected observations.
Specifically, I wrote the following very simple script to get observations by letting the agent follow a pre-computed path. Here trans
and root_ori
are pre-computed translations and rotations in the original mp3d coordinate frame (and have been verified) that I want the camera to be placed at. I also noticed in this issue that seems matterport was applied some rotations so that's where the additional quat_habitat_to_mp3d
comes from. However, the observations I get still seem completely off. Is there anything I'm missing here? Thank you in advance.
quat_habitat_to_mp3d = quat_from_two_vectors(geo.GRAVITY, np.array([0, 0, -1]))
rot_mat_habitat_to_mp3d = quaternion.as_rotation_matrix(quat_habitat_to_mp3d)
rot_mat_mp3d_to_habitat = rot_mat_habitat_to_mp3d.T
camera_pos = trans
camera_rot = root_ori
camera_pos_habitat = camera_pos @ rot_mat_mp3d_to_habitat
camera_rot_habitat = camera_rot @ rot_mat_mp3d_to_habitat
sim.get_agent(0).scene_node.transformation = mn.Matrix4.from_(camera_rot_habitat, camera_pos_habitat)
observation = habitat_sim.utils.viz_utils.observation_to_image(sim.get_sensor_observations()['color_sensor_1st_person'], observation_type='color')
Habitat coordinate system is Y up, -Z forward. As you said, you'll need to apply a corrective transform from your original coordinate space. For reference, here is the way Habitat applies this transform for mp3d by default (FYI this is deprecated to be replaced by config).
Also, looks like you may want to apply the rotations in the opposite order? If camera_pos is in local space (mp3d), then conversion to global space (mp3d to hab) should be on the left.
camera_pos_habitat = rot_mat_mp3d_to_habitat @ camera_pos
Thanks for your reply. Yes that was a mistake. I corrected that but the agent still seems to be placed in a different room (I got the observation below for house zsNo4HB9uLZ
while it's supposed to be in the bedroom). I've verified the pre-computed positions and orientations though.
Also I'm a bit confused about these two places in the Motion Tracking Camera
part of the tutorial:
# set the color sensor transform to be the agent transform
visual_sensor._spec.position = np.array([0, 0, 0])
visual_sensor._spec.orientation = np.array([0, 0, 0])
visual_sensor._sensor_object.set_transformation_from_spec()
In my understanding this should be unnecessary if I'm directly setting the agent's location / orientation later, but it seems adding this piece of code makes a difference to the observations I get.
Another thing is why do we need this "boosting off" from the floor?
# boost the agent off the floor
sim.get_agent(0).scene_node.translation += np.array([0, 1.5, 0])
Thank you again for your help.
In my understanding this should be unnecessary if I'm directly setting the agent's location / orientation later, but it seems adding this piece of code makes a difference to the observations I get.
Another thing is why do we need this "boosting off" from the floor?
These are coupled. First, the sensor spec defines the relative offset in orientation and position of the camera sensor from the agent. By default, the agent body node is attached to the ground via NavMesh and the sensor is offset from the agent by 1.5 meters in Y. If you rotate the agent only around Y, then this is a fine representation. However, in this demo we want to rotate the agent around multiple axis, so I setup the camera to be coincident with the agent and then directly manipulate the agent's scene node. If you didn't then boost the agent off the floor, you'd get observations from the ground height.
You could, alternatively, have directly controlled the camera sensor node, but then you'd need to do so for each sensor attached to the agent (RGB, depth, semantics, etc...) which would be more trouble than just setting up the spec to be concident.
In your case you have pre-computed position and orientation for your camera, so I suggest you do something similar and setup the sensors to be coincident with the agent. Then set the agent state directly to your pre-computed states.
Thanks for your reply. Yes that was a mistake. I corrected that but the agent still seems to be placed in a different room (I got the observation below for house
zsNo4HB9uLZ
while it's supposed to be in the bedroom).
This looks like a bedroom to me. Should it be another bedroom or maybe it's just the sensor offset which is the problem now?
I see. That makes a lot of sense. Right now I'm getting an observation like this.
Yes, the annotated bedroom in this house is a different room. As shown below, the bottom room is the actual bedroom but the current observation shows the agent is in the room above. From the observation it seems both the translation and orientation is off (orientation should be roughly -Z but it's currently +Z).
I'm attaching the current copy of my code below. Do you see anything else missing / incorrect? Thanks a lot.
visual_sensor._spec.position = np.array([0, 0, 0])
visual_sensor._spec.orientation = np.array([0, 0, 0])
visual_sensor._sensor_object.set_transformation_from_spec()
quat_habitat_to_mp3d = quat_from_two_vectors(geo.GRAVITY, np.array([0, 0, -1]))
rot_mat_habitat_to_mp3d = quaternion.as_rotation_matrix(quat_habitat_to_mp3d)
rot_mat_mp3d_to_habitat = rot_mat_habitat_to_mp3d.T
camera_pos = trans
camera_rot = root_ori
camera_pos_habitat = rot_mat_mp3d_to_habitat @ camera_pos
camera_rot_habitat = rot_mat_mp3d_to_habitat @ camera_rot
sim.get_agent(0).scene_node.translation = camera_pos_habitat
sim.get_agent(0).scene_node.rotation = mn.Quaternion.from_matrix(camera_rot_habitat)
observation = habitat_sim.utils.viz_utils.observation_to_image(sim.get_sensor_observations()['color_sensor_1st_person'], observation_type='color')
After fixing a bug in my data loading part, the translation seems correct but the rotation seems to be rotated for an extra 90 degrees. the camera points towards the floor instead of forward in the image below
Since this involves a new problem I'm opening a new issue on this (https://github.com/facebookresearch/habitat-sim/issues/1620).
Habitat-Sim version
v0.2.1
Hi there,
My use case is that given an environment and a set of pre-computed 3D locations & orientations, extract the image observations by placing the camera in those 3D locations & orientations. I was looking at the
ImageExtractor
class and by customizing aPoseExtractor
, I can extract observations from specified 2D locations and orientations. Is there a way to do so with 3D locations though? Or isImageExtractor
not the best place to look at?Thank you in advance.