allenai / allenact

An open source framework for research in Embodied-AI from AI2.
https://www.allenact.org
Other
308 stars 49 forks source link

Some depth values are not within the 0-5 range for RoboTHOR's DepthSensorThor #349

Closed SamsonYuBaiJian closed 2 years ago

SamsonYuBaiJian commented 2 years ago

Problem

I have been trying to get a checkpoint working in a real-world setup, and to do so I was trying to make sure I follow the processing/normalization steps in VisionSensor in 'allenact/embodiedai/sensors/vision_sensors.py' for RoboTHOR's DepthSensorTHOR.

I think the value range for the input depth observation into DepthSensorTHOR from AI2-THOR is supposed to be [0, 5] and the range of DepthSensorTHOR's output should be [-2, 18] due to normalisation with mean 0.5 and std 0.25. Printing SensorSuite(sensors).observation_spaces supports this as it gives: Dict(rgb_lowres:Box(-2.1179039478302, 2.640000104904175, (224, 224, 3), float32), depth_lowres:Box(-2.0, 18.0, (224, 224, 1), float32), goal_object_type_ind:Discrete(7), last_action:Discrete(7))

I printed the max value for the input depth observation for AllenAct's ObjectNav and found that it can be >5 (e.g. 6.59 and 11.47).

From AI2-THOR v2.1.0's documentation (https://allenai.github.io/ai2thor-v2.1.0-documentation/event-metadata), it appears that the max limit for depth is 5m but the units are in mm so it can be up to 5000, which is different from AllenAct's setup?

Steps to reproduce

Steps to reproduce the behavior:

  1. Go to 'allenact/embodiedai/sensors/vision_sensors.py'.
  2. Print the max value of im in get_observation in VisionSensor before any processing.
  3. Run a training or test experiment using the depth sensor (e.g. objectnav_ithor_rgbd_resnet18gru_ddppo).
  4. Make sure the value printed is for the depth sensor instead of the RGB sensor.
  5. See error.

Expected behavior

The max value for im in VisionSensor in 'allenact/embodiedai/sensors/vision_sensors.py' before any processing for the depth sensor is within the range [0, 5].

Screenshots

The third value in the third line in the blue boxes is the max value of the depth observation before any processing.

photo_2022-06-23_22-15-38

photo_2022-06-23_22-23-49

Desktop

Please add the following information:

Additional context

May I know what are the units for the [0, 5] observations for the depth sensor, say meters, since I would like to test a checkpoint in the real world with a D435i camera?

Lastly, how are values beyond the max depth dealt with?

Thank you.

jordis-ai2 commented 2 years ago

Hi @SamsonYuBaiJian,

Thanks for the very detailed description of the issue!

My actual understanding of the action space ranges is that they provide a coarse idea about the expected output ranges (it's common to set the supremum to +np.inf and the infimum to -np.inf for real-valued outputs).

May I know what are the units for the [0, 5] observations for the depth sensor, say meters, since I would like to test a checkpoint in the real world with a D435i camera?

The units are meters, as in https://ai2thor.allenai.org/ithor/documentation/environment-state/#event-depth_frame. If the far clipping plane of the renderer is beyond 5 meters, the raw depth values can also be beyond 5.

Lastly, how are values beyond the max depth dealt with?

The current sensor just applies the normalization operation: subtract the mean, divide by the std: https://github.com/allenai/allenact/blob/a709009b6c7b9d91800dda0c57cd7b4874a92579/allenact/embodiedai/sensors/vision_sensors.py#L186

Please let me know if this answers your question.

SamsonYuBaiJian commented 2 years ago

Thanks @jordis-ai2 for the very fast response, you have answered my question.