facebookresearch / habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.
https://aihabitat.org/
MIT License
1.92k stars 479 forks source link

Error while running ObjectNav task #308

Closed shivanshpatel35 closed 4 years ago

shivanshpatel35 commented 4 years ago

❓ Help

I am trying to run objectnav using the script habitat_baseline/rl/ddppo/single_node.sh on 2 GPU machine. I have edited --exp-config flag to habitat_baselines/config/objectnav/ddppo_objectnav.yaml. I am getting the following error log

CHECKPOINT_FOLDER: new_checkpoints
CHECKPOINT_INTERVAL: 50
CMD_TRAILING_OPTS: []
ENV_NAME: NavRLEnv
EVAL:
  SPLIT: val
  USE_CKPT_CONFIG: True
EVAL_CKPT_PATH_DIR: new_checkpoints
LOG_FILE: train.log
LOG_INTERVAL: 10
NUM_PROCESSES: 1
NUM_UPDATES: 10000
ORBSLAM2:
  ANGLE_TH: 0.2617993877991494
  BETA: 100
  CAMERA_HEIGHT: 1.25
  DEPTH_DENORM: 10.0
  DIST_REACHED_TH: 0.15
  DIST_TO_STOP: 0.05
  D_OBSTACLE_MAX: 4.0
  D_OBSTACLE_MIN: 0.1
  H_OBSTACLE_MAX: 1.25
  H_OBSTACLE_MIN: 0.375
  MAP_CELL_SIZE: 0.1
  MAP_SIZE: 40
  MIN_PTS_IN_OBSTACLE: 320.0
  NEXT_WAYPOINT_TH: 0.5
  NUM_ACTIONS: 3
  PLANNER_MAX_STEPS: 500
  PREPROCESS_MAP: True
  SLAM_SETTINGS_PATH: habitat_baselines/slambased/data/mp3d3_small1k.yaml
  SLAM_VOCAB_PATH: habitat_baselines/slambased/data/ORBvoc.txt
RL:
  DDPPO:
    backbone: resnet50
    distrib_backend: NCCL
    num_recurrent_layers: 2
    pretrained: False
    pretrained_encoder: False
    pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
    reset_critic: True
    rnn_type: LSTM
    sync_frac: 0.6
    train_encoder: True
  PPO:
    clip_param: 0.2
    entropy_coef: 0.01
    eps: 1e-05
    gamma: 0.99
    hidden_size: 512
    lr: 2.5e-06
    max_grad_norm: 0.2
    num_mini_batch: 2
    num_steps: 128
    ppo_epoch: 2
    reward_window_size: 50
    tau: 0.95
    use_gae: True
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    use_normalized_advantage: False
    value_loss_coef: 0.5
  REWARD_MEASURE: distance_to_goal
  SLACK_REWARD: -0.01
  SUCCESS_MEASURE: spl
  SUCCESS_REWARD: 2.5
SENSORS: ['DEPTH_SENSOR', 'RGB_SENSOR']
SIMULATOR_GPU_ID: 0
TASK_CONFIG:
  DATASET:
    CONTENT_SCENES: []
    DATA_PATH: data/datasets/objectnav/mp3d/v0/{split}/{split}.json.gz
    SCENES_DIR: data/scene_datasets/
    SPLIT: val
    TYPE: ObjectNav-v1
  ENVIRONMENT:
    ITERATOR_OPTIONS:
      CYCLE: True
      GROUP_BY_SCENE: True
      MAX_SCENE_REPEAT_EPISODES: -1
      MAX_SCENE_REPEAT_STEPS: 10000
      NUM_EPISODE_SAMPLE: -1
      SHUFFLE: True
      STEP_REPETITION_RANGE: 0.2
    MAX_EPISODE_SECONDS: 10000000
    MAX_EPISODE_STEPS: 500
  PYROBOT:
    BASE_CONTROLLER: proportional
    BASE_PLANNER: none
    BUMP_SENSOR:
      TYPE: PyRobotBumpSensor
    DEPTH_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.0
      NORMALIZE_DEPTH: True
      TYPE: PyRobotDepthSensor
      WIDTH: 640
    LOCOBOT:
      ACTIONS: ['BASE_ACTIONS', 'CAMERA_ACTIONS']
      BASE_ACTIONS: ['go_to_relative', 'go_to_absolute']
      CAMERA_ACTIONS: ['set_pan', 'set_tilt', 'set_pan_tilt']
    RGB_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      TYPE: PyRobotRGBSensor
      WIDTH: 640
    ROBOT: locobot
    ROBOTS: ['locobot']
    SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR', 'BUMP_SENSOR']
  SEED: 100
  SIMULATOR:
    ACTION_SPACE_CONFIG: v1
    AGENTS: ['AGENT_0']
    AGENT_0:
      ANGULAR_ACCELERATION: 12.56
      ANGULAR_FRICTION: 1.0
      COEFFICIENT_OF_RESTITUTION: 0.0
      HEIGHT: 0.88
      IS_SET_START_STATE: False
      LINEAR_ACCELERATION: 20.0
      LINEAR_FRICTION: 0.5
      MASS: 32.0
      RADIUS: 0.2
      SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
      START_POSITION: [0, 0, 0]
      START_ROTATION: [0, 0, 0, 1]
    DEFAULT_AGENT_ID: 0
    DEPTH_SENSOR:
      HEIGHT: 480
      HFOV: 79
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.5
      NORMALIZE_DEPTH: True
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimDepthSensor
      WIDTH: 640
    FORWARD_STEP_SIZE: 0.25
    HABITAT_SIM_V0:
      ALLOW_SLIDING: True
      ENABLE_PHYSICS: False
      GPU_DEVICE_ID: 0
      GPU_GPU: False
      PHYSICS_CONFIG_FILE: ./data/default.phys_scene_config.json
    RGB_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimRGBSensor
      WIDTH: 640
    SCENE: data/scene_datasets/habitat-test-scenes/van-gogh-room.glb
    SEED: 100
    SEMANTIC_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimSemanticSensor
      WIDTH: 640
    TILT_ANGLE: 30
    TURN_ANGLE: 30
    TYPE: Sim-v0
  TASK:
    ACTIONS:
      ANSWER:
        TYPE: AnswerAction
      LOOK_DOWN:
        TYPE: LookDownAction
      LOOK_UP:
        TYPE: LookUpAction
      MOVE_FORWARD:
        TYPE: MoveForwardAction
      STOP:
        TYPE: StopAction
      TELEPORT:
        TYPE: TeleportAction
      TURN_LEFT:
        TYPE: TurnLeftAction
      TURN_RIGHT:
        TYPE: TurnRightAction
    ANSWER_ACCURACY:
      TYPE: AnswerAccuracy
    COLLISIONS:
      TYPE: Collisions
    COMPASS_SENSOR:
      TYPE: CompassSensor
    CORRECT_ANSWER:
      TYPE: CorrectAnswer
    DISTANCE_TO_GOAL:
      DISTANCE_TO: VIEW_POINTS
      TYPE: DistanceToGoal
    EPISODE_INFO:
      TYPE: EpisodeInfo
    GOAL_SENSOR_UUID: objectgoal
    GPS_SENSOR:
      DIMENSIONALITY: 2
      TYPE: GPSSensor
    HEADING_SENSOR:
      TYPE: HeadingSensor
    INSTRUCTION_SENSOR:
      TYPE: InstructionSensor
    INSTRUCTION_SENSOR_UUID: instruction
    MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SPL']
    OBJECTGOAL_SENSOR:
      GOAL_SPEC: TASK_CATEGORY_ID
      GOAL_SPEC_MAX_VAL: 50
      TYPE: ObjectGoalSensor
    POINTGOAL_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalSensor
    POINTGOAL_WITH_GPS_COMPASS_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalWithGPSCompassSensor
    POSSIBLE_ACTIONS: ['STOP', 'MOVE_FORWARD', 'TURN_LEFT', 'TURN_RIGHT', 'LOOK_UP', 'LOOK_DOWN']
    PROXIMITY_SENSOR:
      MAX_DETECTION_RADIUS: 2.0
      TYPE: ProximitySensor
    QUESTION_SENSOR:
      TYPE: QuestionSensor
    SENSORS: ['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']
    SPL:
      DISTANCE_TO: VIEW_POINTS
      SUCCESS_DISTANCE: 0.2
      TYPE: SPL
    SUCCESS_DISTANCE: 0.1
    TOP_DOWN_MAP:
      DRAW_BORDER: True
      DRAW_GOAL_AABBS: True
      DRAW_GOAL_POSITIONS: True
      DRAW_SHORTEST_PATH: True
      DRAW_SOURCE: True
      DRAW_VIEW_POINTS: True
      FOG_OF_WAR:
        DRAW: True
        FOV: 90
        VISIBILITY_DIST: 5.0
      MAP_PADDING: 3
      MAP_RESOLUTION: 1250
      MAX_EPISODE_STEPS: 1000
      NUM_TOPDOWN_MAP_SAMPLE_POINTS: 20000
      TYPE: TopDownMap
    TYPE: ObjectNav-v1
TENSORBOARD_DIR: tb1
TEST_EPISODE_COUNT: 2184
TORCH_GPU_ID: 1
TRAINER_NAME: ppo
VIDEO_DIR: video_dir
VIDEO_OPTION: ['disk', 'tensorboard']
2020-02-20 13:40:17,978 Initializing dataset ObjectNav-v1
2020-02-20 13:40:38,077 Initializing dataset ObjectNav-v1
2020-02-20 13:40:56,217 initializing sim Sim-v0
2020-02-20 13:41:09,272 Initializing task ObjectNav-v1
2020-02-20 13:41:11,333 agent number of parameters: 71371399
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 68, in <module>
    main()
  File "habitat_baselines/run.py", line 38, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 62, in run_exp
    trainer.train()
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 300, in train
    episode_counts,
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 146, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 339, in step
    return self.wait_step()
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 326, in wait_step
    observations.append(read_fn())
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7efc2f8b1668>>
Traceback (most recent call last):
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 468, in __del__
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 347, in close
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 375, in _recv
AttributeError: 'NoneType' object has no attribute 'BytesIO'
Traceback (most recent call last):
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/local-scratch/anaconda3/envs/habitat/bin/python', '-u', 'habitat_baselines/run.py', '--exp-config', 'habitat_baselines/config/objectnav/ddppo_objectnav.yaml', '--run-type', 'train']' returned non-zero exit status 1.

Thanks in advance

qianqianlo commented 4 years ago

Hello, I guess you could try to tune 'NUM_PROCESSES' parameter higher, since you have gpu0 and gpu1 running.

erikwijmans commented 4 years ago

Hi, can you turn logging from the simulator on? i.e. comment out these two lines: https://github.com/facebookresearch/habitat-api/blob/master/habitat_baselines/rl/ddppo/single_node.sh#L3-L4

shivanshpatel35 commented 4 years ago

@erikwijmans Thanks for the response. It leads to the following log


CHECKPOINT_FOLDER: new_checkpoints
CHECKPOINT_INTERVAL: 50
CMD_TRAILING_OPTS: []
ENV_NAME: NavRLEnv
EVAL:
  SPLIT: val
  USE_CKPT_CONFIG: True
EVAL_CKPT_PATH_DIR: new_checkpoints
LOG_FILE: train.log
LOG_INTERVAL: 10
NUM_PROCESSES: 1
NUM_UPDATES: 10000
ORBSLAM2:
  ANGLE_TH: 0.2617993877991494
  BETA: 100
  CAMERA_HEIGHT: 1.25
  DEPTH_DENORM: 10.0
  DIST_REACHED_TH: 0.15
  DIST_TO_STOP: 0.05
  D_OBSTACLE_MAX: 4.0
  D_OBSTACLE_MIN: 0.1
  H_OBSTACLE_MAX: 1.25
  H_OBSTACLE_MIN: 0.375
  MAP_CELL_SIZE: 0.1
  MAP_SIZE: 40
  MIN_PTS_IN_OBSTACLE: 320.0
  NEXT_WAYPOINT_TH: 0.5
  NUM_ACTIONS: 3
  PLANNER_MAX_STEPS: 500
  PREPROCESS_MAP: True
  SLAM_SETTINGS_PATH: habitat_baselines/slambased/data/mp3d3_small1k.yaml
  SLAM_VOCAB_PATH: habitat_baselines/slambased/data/ORBvoc.txt
RL:
  DDPPO:
    backbone: resnet50
    distrib_backend: NCCL
    num_recurrent_layers: 2
    pretrained: False
    pretrained_encoder: False
    pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
    reset_critic: True
    rnn_type: LSTM
    sync_frac: 0.6
    train_encoder: True
  PPO:
    clip_param: 0.2
    entropy_coef: 0.01
    eps: 1e-05
    gamma: 0.99
    hidden_size: 512
    lr: 2.5e-06
    max_grad_norm: 0.2
    num_mini_batch: 2
    num_steps: 128
    ppo_epoch: 2
    reward_window_size: 50
    tau: 0.95
    use_gae: True
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    use_normalized_advantage: False
    value_loss_coef: 0.5
  REWARD_MEASURE: distance_to_goal
  SLACK_REWARD: -0.01
  SUCCESS_MEASURE: spl
  SUCCESS_REWARD: 2.5
SENSORS: ['DEPTH_SENSOR', 'RGB_SENSOR']
SIMULATOR_GPU_ID: 0
TASK_CONFIG:
  DATASET:
    CONTENT_SCENES: []
    DATA_PATH: data/datasets/objectnav/mp3d/v0/{split}/{split}.json.gz
    SCENES_DIR: data/scene_datasets/
    SPLIT: val
    TYPE: ObjectNav-v1
  ENVIRONMENT:
    ITERATOR_OPTIONS:
      CYCLE: True
      GROUP_BY_SCENE: True
      MAX_SCENE_REPEAT_EPISODES: -1
      MAX_SCENE_REPEAT_STEPS: 10000
      NUM_EPISODE_SAMPLE: -1
      SHUFFLE: True
      STEP_REPETITION_RANGE: 0.2
    MAX_EPISODE_SECONDS: 10000000
    MAX_EPISODE_STEPS: 500
  PYROBOT:
    BASE_CONTROLLER: proportional
    BASE_PLANNER: none
    BUMP_SENSOR:
      TYPE: PyRobotBumpSensor
    DEPTH_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.0
      NORMALIZE_DEPTH: True
      TYPE: PyRobotDepthSensor
      WIDTH: 640
    LOCOBOT:
      ACTIONS: ['BASE_ACTIONS', 'CAMERA_ACTIONS']
      BASE_ACTIONS: ['go_to_relative', 'go_to_absolute']
      CAMERA_ACTIONS: ['set_pan', 'set_tilt', 'set_pan_tilt']
    RGB_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      TYPE: PyRobotRGBSensor
      WIDTH: 640
    ROBOT: locobot
    ROBOTS: ['locobot']
    SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR', 'BUMP_SENSOR']
  SEED: 100
  SIMULATOR:
    ACTION_SPACE_CONFIG: v1
    AGENTS: ['AGENT_0']
    AGENT_0:
      ANGULAR_ACCELERATION: 12.56
      ANGULAR_FRICTION: 1.0
      COEFFICIENT_OF_RESTITUTION: 0.0
      HEIGHT: 0.88
      IS_SET_START_STATE: False
      LINEAR_ACCELERATION: 20.0
      LINEAR_FRICTION: 0.5
      MASS: 32.0
      RADIUS: 0.2
      SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
      START_POSITION: [0, 0, 0]
      START_ROTATION: [0, 0, 0, 1]
    DEFAULT_AGENT_ID: 0
    DEPTH_SENSOR:
      HEIGHT: 480
      HFOV: 79
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.5
      NORMALIZE_DEPTH: True
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimDepthSensor
      WIDTH: 640
    FORWARD_STEP_SIZE: 0.25
    HABITAT_SIM_V0:
      ALLOW_SLIDING: True
      ENABLE_PHYSICS: False
      GPU_DEVICE_ID: 0
      GPU_GPU: False
      PHYSICS_CONFIG_FILE: ./data/default.phys_scene_config.json
    RGB_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimRGBSensor
      WIDTH: 640
    SCENE: data/scene_datasets/habitat-test-scenes/van-gogh-room.glb
    SEED: 100
    SEMANTIC_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimSemanticSensor
      WIDTH: 640
    TILT_ANGLE: 30
    TURN_ANGLE: 30
    TYPE: Sim-v0
  TASK:
    ACTIONS:
      ANSWER:
        TYPE: AnswerAction
      LOOK_DOWN:
        TYPE: LookDownAction
      LOOK_UP:
        TYPE: LookUpAction
      MOVE_FORWARD:
        TYPE: MoveForwardAction
      STOP:
        TYPE: StopAction
      TELEPORT:
        TYPE: TeleportAction
      TURN_LEFT:
        TYPE: TurnLeftAction
      TURN_RIGHT:
        TYPE: TurnRightAction
    ANSWER_ACCURACY:
      TYPE: AnswerAccuracy
    COLLISIONS:
      TYPE: Collisions
    COMPASS_SENSOR:
      TYPE: CompassSensor
    CORRECT_ANSWER:
      TYPE: CorrectAnswer
    DISTANCE_TO_GOAL:
      DISTANCE_TO: VIEW_POINTS
      TYPE: DistanceToGoal
    EPISODE_INFO:
      TYPE: EpisodeInfo
    GOAL_SENSOR_UUID: objectgoal
    GPS_SENSOR:
      DIMENSIONALITY: 2
      TYPE: GPSSensor
    HEADING_SENSOR:
      TYPE: HeadingSensor
    INSTRUCTION_SENSOR:
      TYPE: InstructionSensor
    INSTRUCTION_SENSOR_UUID: instruction
    MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SPL']
    OBJECTGOAL_SENSOR:
      GOAL_SPEC: TASK_CATEGORY_ID
      GOAL_SPEC_MAX_VAL: 50
      TYPE: ObjectGoalSensor
    POINTGOAL_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalSensor
    POINTGOAL_WITH_GPS_COMPASS_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalWithGPSCompassSensor
    POSSIBLE_ACTIONS: ['STOP', 'MOVE_FORWARD', 'TURN_LEFT', 'TURN_RIGHT', 'LOOK_UP', 'LOOK_DOWN']
    PROXIMITY_SENSOR:
      MAX_DETECTION_RADIUS: 2.0
      TYPE: ProximitySensor
    QUESTION_SENSOR:
      TYPE: QuestionSensor
    SENSORS: ['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']
    SPL:
      DISTANCE_TO: VIEW_POINTS
      SUCCESS_DISTANCE: 0.2
      TYPE: SPL
    SUCCESS_DISTANCE: 0.1
    TOP_DOWN_MAP:
      DRAW_BORDER: True
      DRAW_GOAL_AABBS: True
      DRAW_GOAL_POSITIONS: True
      DRAW_SHORTEST_PATH: True
      DRAW_SOURCE: True
      DRAW_VIEW_POINTS: True
      FOG_OF_WAR:
        DRAW: True
        FOV: 90
        VISIBILITY_DIST: 5.0
      MAP_PADDING: 3
      MAP_RESOLUTION: 1250
      MAX_EPISODE_STEPS: 1000
      NUM_TOPDOWN_MAP_SAMPLE_POINTS: 20000
      TYPE: TopDownMap
    TYPE: ObjectNav-v1
TENSORBOARD_DIR: tb1
TEST_EPISODE_COUNT: 2184
TORCH_GPU_ID: 1
TRAINER_NAME: ppo
VIDEO_DIR: video_dir
VIDEO_OPTION: ['disk', 'tensorboard']
2020-02-21 08:44:06,775 Initializing dataset ObjectNav-v1
2020-02-21 08:44:28,211 Initializing dataset ObjectNav-v1
2020-02-21 08:44:47,723 initializing sim Sim-v0
Renderer: GeForce RTX 2080 Ti/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 440.33.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0221 08:44:47.865471 21422 ResourceManager.cpp:1054] Importing Basis files as BC7
I0221 08:44:52.729718 21422 Simulator.cpp:112] Loading house from data/scene_datasets/mp3d/QUCTc6BB5sX/QUCTc6BB5sX.house
I0221 08:44:52.729732 21422 Simulator.cpp:118] Loading semantic mesh data/scene_datasets/mp3d/QUCTc6BB5sX/QUCTc6BB5sX_semantic.ply
I0221 08:45:01.464468 21422 Simulator.cpp:130] Loaded.
I0221 08:45:01.571860 21422 simulator.py:142] Loaded navmesh data/scene_datasets/mp3d/QUCTc6BB5sX/QUCTc6BB5sX.navmesh
2020-02-21 08:45:01,577 Initializing task ObjectNav-v1
2020-02-21 08:45:04,926 agent number of parameters: 71371399
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 68, in <module>
    main()
  File "habitat_baselines/run.py", line 38, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 62, in run_exp
    trainer.train()
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 300, in train
    episode_counts,
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 146, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 339, in step
    return self.wait_step()
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 326, in wait_step
    observations.append(read_fn())
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7fd7b6769898>>
Traceback (most recent call last):
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 468, in __del__
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 347, in close
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 375, in _recv
AttributeError: 'NoneType' object has no attribute 'BytesIO'
Traceback (most recent call last):
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/local-scratch/anaconda3/envs/habitat/bin/python', '-u', 'habitat_baselines/run.py', '--exp-config', 'habitat_baselines/config/objectnav/ddppo_objectnav.yaml', '--run-type', 'train']' returned non-zero exit status 1.```
erikwijmans commented 4 years ago

I don't see anything from EGL in the logs, indicating that habitat-sim was built without --headless? Habitat-sim needs to be built with --headless to leverage multiple GPUs.

shivanshpatel35 commented 4 years ago

I installed Habitat-sim using --headless flag. But I reinstalled it in a new environment just for peace of mind. I am still getting the error log -

2020-02-21 11:54:40,275 config: BASE_TASK_CONFIG_PATH: configs/tasks/objectnav_mp3d.yaml
CHECKPOINT_FOLDER: new_checkpoints
CHECKPOINT_INTERVAL: 50
CMD_TRAILING_OPTS: []
ENV_NAME: NavRLEnv
EVAL:
  SPLIT: val
  USE_CKPT_CONFIG: True
EVAL_CKPT_PATH_DIR: new_checkpoints
LOG_FILE: train.log
LOG_INTERVAL: 10
NUM_PROCESSES: 1
NUM_UPDATES: 10000
ORBSLAM2:
  ANGLE_TH: 0.2617993877991494
  BETA: 100
  CAMERA_HEIGHT: 1.25
  DEPTH_DENORM: 10.0
  DIST_REACHED_TH: 0.15
  DIST_TO_STOP: 0.05
  D_OBSTACLE_MAX: 4.0
  D_OBSTACLE_MIN: 0.1
  H_OBSTACLE_MAX: 1.25
  H_OBSTACLE_MIN: 0.375
  MAP_CELL_SIZE: 0.1
  MAP_SIZE: 40
  MIN_PTS_IN_OBSTACLE: 320.0
  NEXT_WAYPOINT_TH: 0.5
  NUM_ACTIONS: 3
  PLANNER_MAX_STEPS: 500
  PREPROCESS_MAP: True
  SLAM_SETTINGS_PATH: habitat_baselines/slambased/data/mp3d3_small1k.yaml
  SLAM_VOCAB_PATH: habitat_baselines/slambased/data/ORBvoc.txt
RL:
  DDPPO:
    backbone: resnet50
    distrib_backend: NCCL
    num_recurrent_layers: 2
    pretrained: False
    pretrained_encoder: False
    pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
    reset_critic: True
    rnn_type: LSTM
    sync_frac: 0.6
    train_encoder: True
  PPO:
    clip_param: 0.2
    entropy_coef: 0.01
    eps: 1e-05
    gamma: 0.99
    hidden_size: 512
    lr: 2.5e-06
    max_grad_norm: 0.2
    num_mini_batch: 1
    num_steps: 128
    ppo_epoch: 2
    reward_window_size: 50
    tau: 0.95
    use_gae: True
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    use_normalized_advantage: False
    value_loss_coef: 0.5
  REWARD_MEASURE: distance_to_goal
  SLACK_REWARD: -0.01
  SUCCESS_MEASURE: spl
  SUCCESS_REWARD: 2.5
SENSORS: ['DEPTH_SENSOR', 'RGB_SENSOR']
SIMULATOR_GPU_ID: 0
TASK_CONFIG:
  DATASET:
    CONTENT_SCENES: []
    DATA_PATH: data/datasets/objectnav/mp3d/v0/{split}/{split}.json.gz
    SCENES_DIR: data/scene_datasets/
    SPLIT: val
    TYPE: ObjectNav-v1
  ENVIRONMENT:
    ITERATOR_OPTIONS:
      CYCLE: True
      GROUP_BY_SCENE: True
      MAX_SCENE_REPEAT_EPISODES: -1
      MAX_SCENE_REPEAT_STEPS: 10000
      NUM_EPISODE_SAMPLE: -1
      SHUFFLE: True
      STEP_REPETITION_RANGE: 0.2
    MAX_EPISODE_SECONDS: 10000000
    MAX_EPISODE_STEPS: 500
  PYROBOT:
    BASE_CONTROLLER: proportional
    BASE_PLANNER: none
    BUMP_SENSOR:
      TYPE: PyRobotBumpSensor
    DEPTH_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.0
      NORMALIZE_DEPTH: True
      TYPE: PyRobotDepthSensor
      WIDTH: 640
    LOCOBOT:
      ACTIONS: ['BASE_ACTIONS', 'CAMERA_ACTIONS']
      BASE_ACTIONS: ['go_to_relative', 'go_to_absolute']
      CAMERA_ACTIONS: ['set_pan', 'set_tilt', 'set_pan_tilt']
    RGB_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      TYPE: PyRobotRGBSensor
      WIDTH: 640
    ROBOT: locobot
    ROBOTS: ['locobot']
    SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR', 'BUMP_SENSOR']
  SEED: 100
  SIMULATOR:
    ACTION_SPACE_CONFIG: v1
    AGENTS: ['AGENT_0']
    AGENT_0:
      ANGULAR_ACCELERATION: 12.56
      ANGULAR_FRICTION: 1.0
      COEFFICIENT_OF_RESTITUTION: 0.0
      HEIGHT: 0.88
      IS_SET_START_STATE: False
      LINEAR_ACCELERATION: 20.0
      LINEAR_FRICTION: 0.5
      MASS: 32.0
      RADIUS: 0.2
      SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
      START_POSITION: [0, 0, 0]
      START_ROTATION: [0, 0, 0, 1]
    DEFAULT_AGENT_ID: 0
    DEPTH_SENSOR:
      HEIGHT: 480
      HFOV: 79
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.5
      NORMALIZE_DEPTH: True
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimDepthSensor
      WIDTH: 640
    FORWARD_STEP_SIZE: 0.25
    HABITAT_SIM_V0:
      ALLOW_SLIDING: True
      ENABLE_PHYSICS: False
      GPU_DEVICE_ID: 0
      GPU_GPU: False
      PHYSICS_CONFIG_FILE: ./data/default.phys_scene_config.json
    RGB_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimRGBSensor
      WIDTH: 640
    SCENE: data/scene_datasets/habitat-test-scenes/van-gogh-room.glb
    SEED: 100
    SEMANTIC_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimSemanticSensor
      WIDTH: 640
    TILT_ANGLE: 30
    TURN_ANGLE: 30
    TYPE: Sim-v0
  TASK:
    ACTIONS:
      ANSWER:
        TYPE: AnswerAction
      LOOK_DOWN:
        TYPE: LookDownAction
      LOOK_UP:
        TYPE: LookUpAction
      MOVE_FORWARD:
        TYPE: MoveForwardAction
      STOP:
        TYPE: StopAction
      TELEPORT:
        TYPE: TeleportAction
      TURN_LEFT:
        TYPE: TurnLeftAction
      TURN_RIGHT:
        TYPE: TurnRightAction
    ANSWER_ACCURACY:
      TYPE: AnswerAccuracy
    COLLISIONS:
      TYPE: Collisions
    COMPASS_SENSOR:
      TYPE: CompassSensor
    CORRECT_ANSWER:
      TYPE: CorrectAnswer
    DISTANCE_TO_GOAL:
      DISTANCE_TO: VIEW_POINTS
      TYPE: DistanceToGoal
    EPISODE_INFO:
      TYPE: EpisodeInfo
    GOAL_SENSOR_UUID: objectgoal
    GPS_SENSOR:
      DIMENSIONALITY: 2
      TYPE: GPSSensor
    HEADING_SENSOR:
      TYPE: HeadingSensor
    INSTRUCTION_SENSOR:
      TYPE: InstructionSensor
    INSTRUCTION_SENSOR_UUID: instruction
    MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SPL']
    OBJECTGOAL_SENSOR:
      GOAL_SPEC: TASK_CATEGORY_ID
      GOAL_SPEC_MAX_VAL: 50
      TYPE: ObjectGoalSensor
    POINTGOAL_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalSensor
    POINTGOAL_WITH_GPS_COMPASS_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalWithGPSCompassSensor
    POSSIBLE_ACTIONS: ['STOP', 'MOVE_FORWARD', 'TURN_LEFT', 'TURN_RIGHT', 'LOOK_UP', 'LOOK_DOWN']
    PROXIMITY_SENSOR:
      MAX_DETECTION_RADIUS: 2.0
      TYPE: ProximitySensor
    QUESTION_SENSOR:
      TYPE: QuestionSensor
    SENSORS: ['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']
    SPL:
      DISTANCE_TO: VIEW_POINTS
      SUCCESS_DISTANCE: 0.2
      TYPE: SPL
    SUCCESS_DISTANCE: 0.1
    TOP_DOWN_MAP:
      DRAW_BORDER: True
      DRAW_GOAL_AABBS: True
      DRAW_GOAL_POSITIONS: True
      DRAW_SHORTEST_PATH: True
      DRAW_SOURCE: True
      DRAW_VIEW_POINTS: True
      FOG_OF_WAR:
        DRAW: True
        FOV: 90
        VISIBILITY_DIST: 5.0
      MAP_PADDING: 3
      MAP_RESOLUTION: 1250
      MAX_EPISODE_STEPS: 1000
      NUM_TOPDOWN_MAP_SAMPLE_POINTS: 20000
      TYPE: TopDownMap
    TYPE: ObjectNav-v1
TENSORBOARD_DIR: tb1
TEST_EPISODE_COUNT: 2184
TORCH_GPU_ID: 1
TRAINER_NAME: ppo
VIDEO_DIR: video_dir
VIDEO_OPTION: ['disk', 'tensorboard']
2020-02-21 11:54:40,275 Initializing dataset ObjectNav-v1
2020-02-21 11:55:01,109 Initializing dataset ObjectNav-v1
2020-02-21 11:55:20,310 initializing sim Sim-v0
Renderer: GeForce RTX 2080 Ti/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 440.33.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0221 11:55:20.444175 21072 ResourceManager.cpp:1054] Importing Basis files as BC7
I0221 11:55:25.973668 21072 Simulator.cpp:112] Loading house from data/scene_datasets/mp3d/Z6MFQCViBuw/Z6MFQCViBuw.house
I0221 11:55:25.973683 21072 Simulator.cpp:118] Loading semantic mesh data/scene_datasets/mp3d/Z6MFQCViBuw/Z6MFQCViBuw_semantic.ply
I0221 11:55:29.180565 21072 Simulator.cpp:130] Loaded.
I0221 11:55:29.239537 21072 simulator.py:142] Loaded navmesh data/scene_datasets/mp3d/Z6MFQCViBuw/Z6MFQCViBuw.navmesh
2020-02-21 11:55:29,242 Initializing task ObjectNav-v1
2020-02-21 11:55:31,141 agent number of parameters: 71371399
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 68, in <module>
    main()
  File "habitat_baselines/run.py", line 38, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 62, in run_exp
    trainer.train()
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 300, in train
    episode_counts,
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 146, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 339, in step
    return self.wait_step()
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 326, in wait_step
    observations.append(read_fn())
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7fb05648e668>>
Traceback (most recent call last):
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 468, in __del__
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 347, in close
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 250, in recv
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 375, in _recv
AttributeError: 'NoneType' object has no attribute 'BytesIO'
Traceback (most recent call last):
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/local-scratch/anaconda3/envs/habitat2/bin/python', '-u', 'habitat_baselines/run.py', '--exp-config', 'habitat_baselines/config/objectnav/ddppo_objectnav.yaml', '--run-type', 'train']' returned non-zero exit status 1

I also tried running both simulator and model on one GPU but the error persists. Following is the error log:

2020-02-21 11:58:13,065 config: BASE_TASK_CONFIG_PATH: configs/tasks/objectnav_mp3d.yaml
CHECKPOINT_FOLDER: new_checkpoints
CHECKPOINT_INTERVAL: 50
CMD_TRAILING_OPTS: []
ENV_NAME: NavRLEnv
EVAL:
  SPLIT: val
  USE_CKPT_CONFIG: True
EVAL_CKPT_PATH_DIR: new_checkpoints
LOG_FILE: train.log
LOG_INTERVAL: 10
NUM_PROCESSES: 1
NUM_UPDATES: 10000
ORBSLAM2:
  ANGLE_TH: 0.2617993877991494
  BETA: 100
  CAMERA_HEIGHT: 1.25
  DEPTH_DENORM: 10.0
  DIST_REACHED_TH: 0.15
  DIST_TO_STOP: 0.05
  D_OBSTACLE_MAX: 4.0
  D_OBSTACLE_MIN: 0.1
  H_OBSTACLE_MAX: 1.25
  H_OBSTACLE_MIN: 0.375
  MAP_CELL_SIZE: 0.1
  MAP_SIZE: 40
  MIN_PTS_IN_OBSTACLE: 320.0
  NEXT_WAYPOINT_TH: 0.5
  NUM_ACTIONS: 3
  PLANNER_MAX_STEPS: 500
  PREPROCESS_MAP: True
  SLAM_SETTINGS_PATH: habitat_baselines/slambased/data/mp3d3_small1k.yaml
  SLAM_VOCAB_PATH: habitat_baselines/slambased/data/ORBvoc.txt
RL:
  DDPPO:
    backbone: resnet50
    distrib_backend: NCCL
    num_recurrent_layers: 2
    pretrained: False
    pretrained_encoder: False
    pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
    reset_critic: True
    rnn_type: LSTM
    sync_frac: 0.6
    train_encoder: True
  PPO:
    clip_param: 0.2
    entropy_coef: 0.01
    eps: 1e-05
    gamma: 0.99
    hidden_size: 512
    lr: 2.5e-06
    max_grad_norm: 0.2
    num_mini_batch: 1
    num_steps: 128
    ppo_epoch: 2
    reward_window_size: 50
    tau: 0.95
    use_gae: True
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    use_normalized_advantage: False
    value_loss_coef: 0.5
  REWARD_MEASURE: distance_to_goal
  SLACK_REWARD: -0.01
  SUCCESS_MEASURE: spl
  SUCCESS_REWARD: 2.5
SENSORS: ['DEPTH_SENSOR', 'RGB_SENSOR']
SIMULATOR_GPU_ID: 0
TASK_CONFIG:
  DATASET:
    CONTENT_SCENES: []
    DATA_PATH: data/datasets/objectnav/mp3d/v0/{split}/{split}.json.gz
    SCENES_DIR: data/scene_datasets/
    SPLIT: val
    TYPE: ObjectNav-v1
  ENVIRONMENT:
    ITERATOR_OPTIONS:
      CYCLE: True
      GROUP_BY_SCENE: True
      MAX_SCENE_REPEAT_EPISODES: -1
      MAX_SCENE_REPEAT_STEPS: 10000
      NUM_EPISODE_SAMPLE: -1
      SHUFFLE: True
      STEP_REPETITION_RANGE: 0.2
    MAX_EPISODE_SECONDS: 10000000
    MAX_EPISODE_STEPS: 500
  PYROBOT:
    BASE_CONTROLLER: proportional
    BASE_PLANNER: none
    BUMP_SENSOR:
      TYPE: PyRobotBumpSensor
    DEPTH_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.0
      NORMALIZE_DEPTH: True
      TYPE: PyRobotDepthSensor
      WIDTH: 640
    LOCOBOT:
      ACTIONS: ['BASE_ACTIONS', 'CAMERA_ACTIONS']
      BASE_ACTIONS: ['go_to_relative', 'go_to_absolute']
      CAMERA_ACTIONS: ['set_pan', 'set_tilt', 'set_pan_tilt']
    RGB_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      TYPE: PyRobotRGBSensor
      WIDTH: 640
    ROBOT: locobot
    ROBOTS: ['locobot']
    SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR', 'BUMP_SENSOR']
  SEED: 100
  SIMULATOR:
    ACTION_SPACE_CONFIG: v1
    AGENTS: ['AGENT_0']
    AGENT_0:
      ANGULAR_ACCELERATION: 12.56
      ANGULAR_FRICTION: 1.0
      COEFFICIENT_OF_RESTITUTION: 0.0
      HEIGHT: 0.88
      IS_SET_START_STATE: False
      LINEAR_ACCELERATION: 20.0
      LINEAR_FRICTION: 0.5
      MASS: 32.0
      RADIUS: 0.2
      SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
      START_POSITION: [0, 0, 0]
      START_ROTATION: [0, 0, 0, 1]
    DEFAULT_AGENT_ID: 0
    DEPTH_SENSOR:
      HEIGHT: 480
      HFOV: 79
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.5
      NORMALIZE_DEPTH: True
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimDepthSensor
      WIDTH: 640
    FORWARD_STEP_SIZE: 0.25
    HABITAT_SIM_V0:
      ALLOW_SLIDING: True
      ENABLE_PHYSICS: False
      GPU_DEVICE_ID: 0
      GPU_GPU: False
      PHYSICS_CONFIG_FILE: ./data/default.phys_scene_config.json
    RGB_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimRGBSensor
      WIDTH: 640
    SCENE: data/scene_datasets/habitat-test-scenes/van-gogh-room.glb
    SEED: 100
    SEMANTIC_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimSemanticSensor
      WIDTH: 640
    TILT_ANGLE: 30
    TURN_ANGLE: 30
    TYPE: Sim-v0
  TASK:
    ACTIONS:
      ANSWER:
        TYPE: AnswerAction
      LOOK_DOWN:
        TYPE: LookDownAction
      LOOK_UP:
        TYPE: LookUpAction
      MOVE_FORWARD:
        TYPE: MoveForwardAction
      STOP:
        TYPE: StopAction
      TELEPORT:
        TYPE: TeleportAction
      TURN_LEFT:
        TYPE: TurnLeftAction
      TURN_RIGHT:
        TYPE: TurnRightAction
    ANSWER_ACCURACY:
      TYPE: AnswerAccuracy
    COLLISIONS:
      TYPE: Collisions
    COMPASS_SENSOR:
      TYPE: CompassSensor
    CORRECT_ANSWER:
      TYPE: CorrectAnswer
    DISTANCE_TO_GOAL:
      DISTANCE_TO: VIEW_POINTS
      TYPE: DistanceToGoal
    EPISODE_INFO:
      TYPE: EpisodeInfo
    GOAL_SENSOR_UUID: objectgoal
    GPS_SENSOR:
      DIMENSIONALITY: 2
      TYPE: GPSSensor
    HEADING_SENSOR:
      TYPE: HeadingSensor
    INSTRUCTION_SENSOR:
      TYPE: InstructionSensor
    INSTRUCTION_SENSOR_UUID: instruction
    MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SPL']
    OBJECTGOAL_SENSOR:
      GOAL_SPEC: TASK_CATEGORY_ID
      GOAL_SPEC_MAX_VAL: 50
      TYPE: ObjectGoalSensor
    POINTGOAL_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalSensor
    POINTGOAL_WITH_GPS_COMPASS_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalWithGPSCompassSensor
    POSSIBLE_ACTIONS: ['STOP', 'MOVE_FORWARD', 'TURN_LEFT', 'TURN_RIGHT', 'LOOK_UP', 'LOOK_DOWN']
    PROXIMITY_SENSOR:
      MAX_DETECTION_RADIUS: 2.0
      TYPE: ProximitySensor
    QUESTION_SENSOR:
      TYPE: QuestionSensor
    SENSORS: ['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']
    SPL:
      DISTANCE_TO: VIEW_POINTS
      SUCCESS_DISTANCE: 0.2
      TYPE: SPL
    SUCCESS_DISTANCE: 0.1
    TOP_DOWN_MAP:
      DRAW_BORDER: True
      DRAW_GOAL_AABBS: True
      DRAW_GOAL_POSITIONS: True
      DRAW_SHORTEST_PATH: True
      DRAW_SOURCE: True
      DRAW_VIEW_POINTS: True
      FOG_OF_WAR:
        DRAW: True
        FOV: 90
        VISIBILITY_DIST: 5.0
      MAP_PADDING: 3
      MAP_RESOLUTION: 1250
      MAX_EPISODE_STEPS: 1000
      NUM_TOPDOWN_MAP_SAMPLE_POINTS: 20000
      TYPE: TopDownMap
    TYPE: ObjectNav-v1
TENSORBOARD_DIR: tb1
TEST_EPISODE_COUNT: 2184
TORCH_GPU_ID: 0
TRAINER_NAME: ppo
VIDEO_DIR: video_dir
VIDEO_OPTION: ['disk', 'tensorboard']
2020-02-21 11:58:13,065 Initializing dataset ObjectNav-v1
2020-02-21 11:58:35,118 Initializing dataset ObjectNav-v1
2020-02-21 11:58:54,832 initializing sim Sim-v0
Renderer: GeForce RTX 2080 Ti/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 440.33.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0221 11:58:55.015694 21426 ResourceManager.cpp:1054] Importing Basis files as BC7
I0221 11:58:56.439661 21426 Simulator.cpp:112] Loading house from data/scene_datasets/mp3d/pLe4wQe7qrG/pLe4wQe7qrG.house
I0221 11:58:56.439679 21426 Simulator.cpp:118] Loading semantic mesh data/scene_datasets/mp3d/pLe4wQe7qrG/pLe4wQe7qrG_semantic.ply
I0221 11:58:57.127120 21426 Simulator.cpp:130] Loaded.
I0221 11:58:57.144647 21426 simulator.py:142] Loaded navmesh data/scene_datasets/mp3d/pLe4wQe7qrG/pLe4wQe7qrG.navmesh
2020-02-21 11:58:57,146 Initializing task ObjectNav-v1
2020-02-21 11:58:59,754 agent number of parameters: 71371399
I0221 11:59:20.487156 21426 Simulator.cpp:35] Deconstructing Simulator
I0221 11:59:20.487174 21426 SemanticScene.h:40] Deconstructing SemanticScene
I0221 11:59:20.487610 21426 SceneManager.h:24] Deconstructing SceneManager
I0221 11:59:20.487617 21426 SceneGraph.h:20] Deconstructing SceneGraph
I0221 11:59:20.487627 21426 RenderTarget.h:51] Deconstructing RenderTarget
I0221 11:59:20.488035 21426 Sensor.h:80] Deconstructing Sensor
I0221 11:59:20.488044 21426 RenderTarget.h:51] Deconstructing RenderTarget
I0221 11:59:20.488250 21426 Sensor.h:80] Deconstructing Sensor
I0221 11:59:20.488260 21426 SceneGraph.h:20] Deconstructing SceneGraph
I0221 11:59:20.491708 21426 Renderer.cpp:33] Deconstructing Renderer
I0221 11:59:20.491716 21426 WindowlessContext.h:16] Deconstructing WindowlessContext
I0221 11:59:20.491719 21426 WindowlessContext.cpp:245] Deconstructing GL context
Renderer: GeForce RTX 2080 Ti/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 440.33.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
I0221 11:59:20.526748 21426 ResourceManager.cpp:1054] Importing Basis files as BC7
I0221 11:59:25.743806 21426 Simulator.cpp:112] Loading house from data/scene_datasets/mp3d/Z6MFQCViBuw/Z6MFQCViBuw.house
I0221 11:59:25.743822 21426 Simulator.cpp:118] Loading semantic mesh data/scene_datasets/mp3d/Z6MFQCViBuw/Z6MFQCViBuw_semantic.ply
I0221 11:59:28.846593 21426 Simulator.cpp:130] Loaded.
I0221 11:59:28.908223 21426 simulator.py:142] Loaded navmesh data/scene_datasets/mp3d/Z6MFQCViBuw/Z6MFQCViBuw.navmesh
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 68, in <module>
    main()
  File "habitat_baselines/run.py", line 38, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 62, in run_exp
    trainer.train()
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 300, in train
    episode_counts,
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 146, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 339, in step
    return self.wait_step()
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 326, in wait_step
    observations.append(read_fn())
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7efb2bd01668>>
Traceback (most recent call last):
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 468, in __del__
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 347, in close
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 250, in recv
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/multiprocessing/connection.py", line 375, in _recv
AttributeError: 'NoneType' object has no attribute 'BytesIO'
Traceback (most recent call last):
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/local-scratch/anaconda3/envs/habitat2/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/local-scratch/anaconda3/envs/habitat2/bin/python', '-u', 'habitat_baselines/run.py', '--exp-config', 'habitat_baselines/config/objectnav/ddppo_objectnav.yaml', '--run-type', 'train']' returned non-zero exit status 1.
erikwijmans commented 4 years ago

I am still not seeing anything about EGL in logs, which is concerning/confusing. What is your system memory amount? The ObjectNav dataset currently takes a lot of memory (we are fixing that in PR #309), so maybe you are getting killed by the OOM killer.

shivanshpatel35 commented 4 years ago

I used watch free -g to monitor memory. Before I run the code, free memory is 41GB as shown here:

                 total        used        free      shared  buff/cache   available
Mem:             62           7          41           0          13          54
Swap:             0           0           0

During the entire runtime of the code, free memory doesn't fall below 36GB

erikwijmans commented 4 years ago

Does the normal habitat-sim example script work? i.e. does python examples/example.py --scene data/scene_datasets/mp3d/17DRP5sb8fy/17DRP5sb8fy.glb while in /path/to/habitat-sim work?

shivanshpatel35 commented 4 years ago

Yes, the normal habitat-sim example works. Final perfomance printed by running python examples/example.py --scene data/scene_datasets/mp3d/17DRP5sb8fy/17DRP5sb8fy.glb is:

 ========================= Performance ======================== 
 640 x 480, total time 1.61 s, frame time 1.610 ms (621.2 FPS)
 ==============================================================
erikwijmans commented 4 years ago

Can you run things directly, maybe torch.distributed.launch is blocking the output of something important. i.e just python habitat_baselines/run.py --exp-config habitat_baselines/config/objectnav/ddppo_objectnav.yaml --run-type train?

shivanshpatel35 commented 4 years ago

Running things without torch.distributed.launch results the following log

2020-02-21 15:12:22,378 config: BASE_TASK_CONFIG_PATH: configs/tasks/objectnav_mp3d.yaml
CHECKPOINT_FOLDER: new_checkpoints
CHECKPOINT_INTERVAL: 50
CMD_TRAILING_OPTS: []
ENV_NAME: NavRLEnv
EVAL:
  SPLIT: val
  USE_CKPT_CONFIG: True
EVAL_CKPT_PATH_DIR: new_checkpoints
LOG_FILE: train.log
LOG_INTERVAL: 10
NUM_PROCESSES: 1
NUM_UPDATES: 10000
ORBSLAM2:
  ANGLE_TH: 0.2617993877991494
  BETA: 100
  CAMERA_HEIGHT: 1.25
  DEPTH_DENORM: 10.0
  DIST_REACHED_TH: 0.15
  DIST_TO_STOP: 0.05
  D_OBSTACLE_MAX: 4.0
  D_OBSTACLE_MIN: 0.1
  H_OBSTACLE_MAX: 1.25
  H_OBSTACLE_MIN: 0.375
  MAP_CELL_SIZE: 0.1
  MAP_SIZE: 40
  MIN_PTS_IN_OBSTACLE: 320.0
  NEXT_WAYPOINT_TH: 0.5
  NUM_ACTIONS: 3
  PLANNER_MAX_STEPS: 500
  PREPROCESS_MAP: True
  SLAM_SETTINGS_PATH: habitat_baselines/slambased/data/mp3d3_small1k.yaml
  SLAM_VOCAB_PATH: habitat_baselines/slambased/data/ORBvoc.txt
RL:
  DDPPO:
    backbone: resnet50
    distrib_backend: NCCL
    num_recurrent_layers: 2
    pretrained: False
    pretrained_encoder: False
    pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
    reset_critic: True
    rnn_type: LSTM
    sync_frac: 0.6
    train_encoder: True
  PPO:
    clip_param: 0.2
    entropy_coef: 0.01
    eps: 1e-05
    gamma: 0.99
    hidden_size: 512
    lr: 2.5e-06
    max_grad_norm: 0.2
    num_mini_batch: 1
    num_steps: 128
    ppo_epoch: 2
    reward_window_size: 50
    tau: 0.95
    use_gae: True
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    use_normalized_advantage: False
    value_loss_coef: 0.5
  REWARD_MEASURE: distance_to_goal
  SLACK_REWARD: -0.01
  SUCCESS_MEASURE: spl
  SUCCESS_REWARD: 2.5
SENSORS: ['DEPTH_SENSOR', 'RGB_SENSOR']
SIMULATOR_GPU_ID: 0
TASK_CONFIG:
  DATASET:
    CONTENT_SCENES: []
    DATA_PATH: data/datasets/objectnav/mp3d/v0/{split}/{split}.json.gz
    SCENES_DIR: data/scene_datasets/
    SPLIT: val
    TYPE: ObjectNav-v1
  ENVIRONMENT:
    ITERATOR_OPTIONS:
      CYCLE: True
      GROUP_BY_SCENE: True
      MAX_SCENE_REPEAT_EPISODES: -1
      MAX_SCENE_REPEAT_STEPS: 10000
      NUM_EPISODE_SAMPLE: -1
      SHUFFLE: True
      STEP_REPETITION_RANGE: 0.2
    MAX_EPISODE_SECONDS: 10000000
    MAX_EPISODE_STEPS: 500
  PYROBOT:
    BASE_CONTROLLER: proportional
    BASE_PLANNER: none
    BUMP_SENSOR:
      TYPE: PyRobotBumpSensor
    DEPTH_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.0
      NORMALIZE_DEPTH: True
      TYPE: PyRobotDepthSensor
      WIDTH: 640
    LOCOBOT:
      ACTIONS: ['BASE_ACTIONS', 'CAMERA_ACTIONS']
      BASE_ACTIONS: ['go_to_relative', 'go_to_absolute']
      CAMERA_ACTIONS: ['set_pan', 'set_tilt', 'set_pan_tilt']
    RGB_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      TYPE: PyRobotRGBSensor
      WIDTH: 640
    ROBOT: locobot
    ROBOTS: ['locobot']
    SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR', 'BUMP_SENSOR']
  SEED: 100
  SIMULATOR:
    ACTION_SPACE_CONFIG: v1
    AGENTS: ['AGENT_0']
    AGENT_0:
      ANGULAR_ACCELERATION: 12.56
      ANGULAR_FRICTION: 1.0
      COEFFICIENT_OF_RESTITUTION: 0.0
      HEIGHT: 0.88
      IS_SET_START_STATE: False
      LINEAR_ACCELERATION: 20.0
      LINEAR_FRICTION: 0.5
      MASS: 32.0
      RADIUS: 0.2
      SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
      START_POSITION: [0, 0, 0]
      START_ROTATION: [0, 0, 0, 1]
    DEFAULT_AGENT_ID: 0
    DEPTH_SENSOR:
      HEIGHT: 480
      HFOV: 79
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.5
      NORMALIZE_DEPTH: True
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimDepthSensor
      WIDTH: 640
    FORWARD_STEP_SIZE: 0.25
    HABITAT_SIM_V0:
      ALLOW_SLIDING: True
      ENABLE_PHYSICS: False
      GPU_DEVICE_ID: 0
      GPU_GPU: False
      PHYSICS_CONFIG_FILE: ./data/default.phys_scene_config.json
    RGB_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimRGBSensor
      WIDTH: 640
    SCENE: data/scene_datasets/habitat-test-scenes/van-gogh-room.glb
    SEED: 100
    SEMANTIC_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimSemanticSensor
      WIDTH: 640
    TILT_ANGLE: 30
    TURN_ANGLE: 30
    TYPE: Sim-v0
  TASK:
    ACTIONS:
      ANSWER:
        TYPE: AnswerAction
      LOOK_DOWN:
        TYPE: LookDownAction
      LOOK_UP:
        TYPE: LookUpAction
      MOVE_FORWARD:
        TYPE: MoveForwardAction
      STOP:
        TYPE: StopAction
      TELEPORT:
        TYPE: TeleportAction
      TURN_LEFT:
        TYPE: TurnLeftAction
      TURN_RIGHT:
        TYPE: TurnRightAction
    ANSWER_ACCURACY:
      TYPE: AnswerAccuracy
    COLLISIONS:
      TYPE: Collisions
    COMPASS_SENSOR:
      TYPE: CompassSensor
    CORRECT_ANSWER:
      TYPE: CorrectAnswer
    DISTANCE_TO_GOAL:
      DISTANCE_TO: VIEW_POINTS
      TYPE: DistanceToGoal
    EPISODE_INFO:
      TYPE: EpisodeInfo
    GOAL_SENSOR_UUID: objectgoal
    GPS_SENSOR:
      DIMENSIONALITY: 2
      TYPE: GPSSensor
    HEADING_SENSOR:
      TYPE: HeadingSensor
    INSTRUCTION_SENSOR:
      TYPE: InstructionSensor
    INSTRUCTION_SENSOR_UUID: instruction
    MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SPL']
    OBJECTGOAL_SENSOR:
      GOAL_SPEC: TASK_CATEGORY_ID
      GOAL_SPEC_MAX_VAL: 50
      TYPE: ObjectGoalSensor
    POINTGOAL_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalSensor
    POINTGOAL_WITH_GPS_COMPASS_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalWithGPSCompassSensor
    POSSIBLE_ACTIONS: ['STOP', 'MOVE_FORWARD', 'TURN_LEFT', 'TURN_RIGHT', 'LOOK_UP', 'LOOK_DOWN']
    PROXIMITY_SENSOR:
      MAX_DETECTION_RADIUS: 2.0
      TYPE: ProximitySensor
    QUESTION_SENSOR:
      TYPE: QuestionSensor
    SENSORS: ['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']
    SPL:
      DISTANCE_TO: VIEW_POINTS
      SUCCESS_DISTANCE: 0.2
      TYPE: SPL
    SUCCESS_DISTANCE: 0.1
    TOP_DOWN_MAP:
      DRAW_BORDER: True
      DRAW_GOAL_AABBS: True
      DRAW_GOAL_POSITIONS: True
      DRAW_SHORTEST_PATH: True
      DRAW_SOURCE: True
      DRAW_VIEW_POINTS: True
      FOG_OF_WAR:
        DRAW: True
        FOV: 90
        VISIBILITY_DIST: 5.0
      MAP_PADDING: 3
      MAP_RESOLUTION: 1250
      MAX_EPISODE_STEPS: 1000
      NUM_TOPDOWN_MAP_SAMPLE_POINTS: 20000
      TYPE: TopDownMap
    TYPE: ObjectNav-v1
TENSORBOARD_DIR: tb1
TEST_EPISODE_COUNT: 2184
TORCH_GPU_ID: 0
TRAINER_NAME: ppo
VIDEO_DIR: video_dir
VIDEO_OPTION: ['disk', 'tensorboard']
2020-02-21 15:12:22,378 Initializing dataset ObjectNav-v1
2020-02-21 15:12:47,742 Initializing dataset ObjectNav-v1
2020-02-21 15:13:10,882 initializing sim Sim-v0
Renderer: GeForce RTX 2080 Ti/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 440.33.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0221 15:13:11.062321 12732 ResourceManager.cpp:1054] Importing Basis files as BC7
I0221 15:13:12.485432 12732 Simulator.cpp:112] Loading house from data/scene_datasets/mp3d/pLe4wQe7qrG/pLe4wQe7qrG.house
I0221 15:13:12.485448 12732 Simulator.cpp:118] Loading semantic mesh data/scene_datasets/mp3d/pLe4wQe7qrG/pLe4wQe7qrG_semantic.ply
I0221 15:13:13.312784 12732 Simulator.cpp:130] Loaded.
I0221 15:13:13.327396 12732 simulator.py:142] Loaded navmesh data/scene_datasets/mp3d/pLe4wQe7qrG/pLe4wQe7qrG.navmesh
2020-02-21 15:13:13,329 Initializing task ObjectNav-v1
2020-02-21 15:13:16,349 agent number of parameters: 71371399
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
I0221 15:13:39.968309 12732 Simulator.cpp:35] Deconstructing Simulator
I0221 15:13:39.968324 12732 SemanticScene.h:40] Deconstructing SemanticScene
I0221 15:13:39.968732 12732 SceneManager.h:24] Deconstructing SceneManager
I0221 15:13:39.968737 12732 SceneGraph.h:20] Deconstructing SceneGraph
I0221 15:13:39.968749 12732 RenderTarget.h:51] Deconstructing RenderTarget
I0221 15:13:39.969135 12732 Sensor.h:80] Deconstructing Sensor
I0221 15:13:39.969146 12732 RenderTarget.h:51] Deconstructing RenderTarget
I0221 15:13:39.969352 12732 Sensor.h:80] Deconstructing Sensor
I0221 15:13:39.969360 12732 SceneGraph.h:20] Deconstructing SceneGraph
I0221 15:13:39.972826 12732 Renderer.cpp:33] Deconstructing Renderer
I0221 15:13:39.972834 12732 WindowlessContext.h:16] Deconstructing WindowlessContext
I0221 15:13:39.972837 12732 WindowlessContext.cpp:245] Deconstructing GL context
Renderer: GeForce RTX 2080 Ti/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 440.33.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
I0221 15:13:39.994935 12732 ResourceManager.cpp:1054] Importing Basis files as BC7
I0221 15:13:45.554143 12732 Simulator.cpp:112] Loading house from data/scene_datasets/mp3d/QUCTc6BB5sX/QUCTc6BB5sX.house
I0221 15:13:45.554159 12732 Simulator.cpp:118] Loading semantic mesh data/scene_datasets/mp3d/QUCTc6BB5sX/QUCTc6BB5sX_semantic.ply
I0221 15:13:57.232524 12732 Simulator.cpp:130] Loaded.
I0221 15:13:57.374905 12732 simulator.py:142] Loaded navmesh data/scene_datasets/mp3d/QUCTc6BB5sX/QUCTc6BB5sX.navmesh
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 68, in <module>
    main()
  File "habitat_baselines/run.py", line 38, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 62, in run_exp
    trainer.train()
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 300, in train
    episode_counts,
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 146, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 339, in step
    return self.wait_step()
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 326, in wait_step
    observations.append(read_fn())
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7f4aa0e0d7b8>>
Traceback (most recent call last):
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 468, in __del__
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 347, in close
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 375, in _recv
AttributeError: 'NoneType' object has no attribute 'BytesIO'
erikwijmans commented 4 years ago

This is odd. One last idea for debugging: Change habitat.VectorEnv to habitat.ThreadedVectorEnv here: https://github.com/facebookresearch/habitat-api/blob/master/habitat_baselines/common/env_utils.py#L94 from `

shivanshpatel35 commented 4 years ago

I think the code ran but it lead to another issue, possibly becuase reward is nan

2020-02-21 15:24:50,497 config: BASE_TASK_CONFIG_PATH: configs/tasks/objectnav_mp3d.yaml
CHECKPOINT_FOLDER: new_checkpoints
CHECKPOINT_INTERVAL: 50
CMD_TRAILING_OPTS: []
ENV_NAME: NavRLEnv
EVAL:
  SPLIT: val
  USE_CKPT_CONFIG: True
EVAL_CKPT_PATH_DIR: new_checkpoints
LOG_FILE: train.log
LOG_INTERVAL: 10
NUM_PROCESSES: 1
NUM_UPDATES: 10000
ORBSLAM2:
  ANGLE_TH: 0.2617993877991494
  BETA: 100
  CAMERA_HEIGHT: 1.25
  DEPTH_DENORM: 10.0
  DIST_REACHED_TH: 0.15
  DIST_TO_STOP: 0.05
  D_OBSTACLE_MAX: 4.0
  D_OBSTACLE_MIN: 0.1
  H_OBSTACLE_MAX: 1.25
  H_OBSTACLE_MIN: 0.375
  MAP_CELL_SIZE: 0.1
  MAP_SIZE: 40
  MIN_PTS_IN_OBSTACLE: 320.0
  NEXT_WAYPOINT_TH: 0.5
  NUM_ACTIONS: 3
  PLANNER_MAX_STEPS: 500
  PREPROCESS_MAP: True
  SLAM_SETTINGS_PATH: habitat_baselines/slambased/data/mp3d3_small1k.yaml
  SLAM_VOCAB_PATH: habitat_baselines/slambased/data/ORBvoc.txt
RL:
  DDPPO:
    backbone: resnet50
    distrib_backend: NCCL
    num_recurrent_layers: 2
    pretrained: False
    pretrained_encoder: False
    pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
    reset_critic: True
    rnn_type: LSTM
    sync_frac: 0.6
    train_encoder: True
  PPO:
    clip_param: 0.2
    entropy_coef: 0.01
    eps: 1e-05
    gamma: 0.99
    hidden_size: 512
    lr: 2.5e-06
    max_grad_norm: 0.2
    num_mini_batch: 1
    num_steps: 128
    ppo_epoch: 2
    reward_window_size: 50
    tau: 0.95
    use_gae: True
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    use_normalized_advantage: False
    value_loss_coef: 0.5
  REWARD_MEASURE: distance_to_goal
  SLACK_REWARD: -0.01
  SUCCESS_MEASURE: spl
  SUCCESS_REWARD: 2.5
SENSORS: ['DEPTH_SENSOR', 'RGB_SENSOR']
SIMULATOR_GPU_ID: 0
TASK_CONFIG:
  DATASET:
    CONTENT_SCENES: []
    DATA_PATH: data/datasets/objectnav/mp3d/v0/{split}/{split}.json.gz
    SCENES_DIR: data/scene_datasets/
    SPLIT: val
    TYPE: ObjectNav-v1
  ENVIRONMENT:
    ITERATOR_OPTIONS:
      CYCLE: True
      GROUP_BY_SCENE: True
      MAX_SCENE_REPEAT_EPISODES: -1
      MAX_SCENE_REPEAT_STEPS: 10000
      NUM_EPISODE_SAMPLE: -1
      SHUFFLE: True
      STEP_REPETITION_RANGE: 0.2
    MAX_EPISODE_SECONDS: 10000000
    MAX_EPISODE_STEPS: 500
  PYROBOT:
    BASE_CONTROLLER: proportional
    BASE_PLANNER: none
    BUMP_SENSOR:
      TYPE: PyRobotBumpSensor
    DEPTH_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.0
      NORMALIZE_DEPTH: True
      TYPE: PyRobotDepthSensor
      WIDTH: 640
    LOCOBOT:
      ACTIONS: ['BASE_ACTIONS', 'CAMERA_ACTIONS']
      BASE_ACTIONS: ['go_to_relative', 'go_to_absolute']
      CAMERA_ACTIONS: ['set_pan', 'set_tilt', 'set_pan_tilt']
    RGB_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      TYPE: PyRobotRGBSensor
      WIDTH: 640
    ROBOT: locobot
    ROBOTS: ['locobot']
    SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR', 'BUMP_SENSOR']
  SEED: 100
  SIMULATOR:
    ACTION_SPACE_CONFIG: v1
    AGENTS: ['AGENT_0']
    AGENT_0:
      ANGULAR_ACCELERATION: 12.56
      ANGULAR_FRICTION: 1.0
      COEFFICIENT_OF_RESTITUTION: 0.0
      HEIGHT: 0.88
      IS_SET_START_STATE: False
      LINEAR_ACCELERATION: 20.0
      LINEAR_FRICTION: 0.5
      MASS: 32.0
      RADIUS: 0.2
      SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
      START_POSITION: [0, 0, 0]
      START_ROTATION: [0, 0, 0, 1]
    DEFAULT_AGENT_ID: 0
    DEPTH_SENSOR:
      HEIGHT: 480
      HFOV: 79
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.5
      NORMALIZE_DEPTH: True
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimDepthSensor
      WIDTH: 640
    FORWARD_STEP_SIZE: 0.25
    HABITAT_SIM_V0:
      ALLOW_SLIDING: True
      ENABLE_PHYSICS: False
      GPU_DEVICE_ID: 0
      GPU_GPU: False
      PHYSICS_CONFIG_FILE: ./data/default.phys_scene_config.json
    RGB_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimRGBSensor
      WIDTH: 640
    SCENE: data/scene_datasets/habitat-test-scenes/van-gogh-room.glb
    SEED: 100
    SEMANTIC_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimSemanticSensor
      WIDTH: 640
    TILT_ANGLE: 30
    TURN_ANGLE: 30
    TYPE: Sim-v0
  TASK:
    ACTIONS:
      ANSWER:
        TYPE: AnswerAction
      LOOK_DOWN:
        TYPE: LookDownAction
      LOOK_UP:
        TYPE: LookUpAction
      MOVE_FORWARD:
        TYPE: MoveForwardAction
      STOP:
        TYPE: StopAction
      TELEPORT:
        TYPE: TeleportAction
      TURN_LEFT:
        TYPE: TurnLeftAction
      TURN_RIGHT:
        TYPE: TurnRightAction
    ANSWER_ACCURACY:
      TYPE: AnswerAccuracy
    COLLISIONS:
      TYPE: Collisions
    COMPASS_SENSOR:
      TYPE: CompassSensor
    CORRECT_ANSWER:
      TYPE: CorrectAnswer
    DISTANCE_TO_GOAL:
      DISTANCE_TO: VIEW_POINTS
      TYPE: DistanceToGoal
    EPISODE_INFO:
      TYPE: EpisodeInfo
    GOAL_SENSOR_UUID: objectgoal
    GPS_SENSOR:
      DIMENSIONALITY: 2
      TYPE: GPSSensor
    HEADING_SENSOR:
      TYPE: HeadingSensor
    INSTRUCTION_SENSOR:
      TYPE: InstructionSensor
    INSTRUCTION_SENSOR_UUID: instruction
    MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SPL']
    OBJECTGOAL_SENSOR:
      GOAL_SPEC: TASK_CATEGORY_ID
      GOAL_SPEC_MAX_VAL: 50
      TYPE: ObjectGoalSensor
    POINTGOAL_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalSensor
    POINTGOAL_WITH_GPS_COMPASS_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalWithGPSCompassSensor
    POSSIBLE_ACTIONS: ['STOP', 'MOVE_FORWARD', 'TURN_LEFT', 'TURN_RIGHT', 'LOOK_UP', 'LOOK_DOWN']
    PROXIMITY_SENSOR:
      MAX_DETECTION_RADIUS: 2.0
      TYPE: ProximitySensor
    QUESTION_SENSOR:
      TYPE: QuestionSensor
    SENSORS: ['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']
    SPL:
      DISTANCE_TO: VIEW_POINTS
      SUCCESS_DISTANCE: 0.2
      TYPE: SPL
    SUCCESS_DISTANCE: 0.1
    TOP_DOWN_MAP:
      DRAW_BORDER: True
      DRAW_GOAL_AABBS: True
      DRAW_GOAL_POSITIONS: True
      DRAW_SHORTEST_PATH: True
      DRAW_SOURCE: True
      DRAW_VIEW_POINTS: True
      FOG_OF_WAR:
        DRAW: True
        FOV: 90
        VISIBILITY_DIST: 5.0
      MAP_PADDING: 3
      MAP_RESOLUTION: 1250
      MAX_EPISODE_STEPS: 1000
      NUM_TOPDOWN_MAP_SAMPLE_POINTS: 20000
      TYPE: TopDownMap
    TYPE: ObjectNav-v1
TENSORBOARD_DIR: tb1
TEST_EPISODE_COUNT: 2184
TORCH_GPU_ID: 0
TRAINER_NAME: ppo
VIDEO_DIR: video_dir
VIDEO_OPTION: ['disk', 'tensorboard']
2020-02-21 15:24:50,497 Initializing dataset ObjectNav-v1
2020-02-21 15:25:13,690 Initializing dataset ObjectNav-v1
2020-02-21 15:25:36,289 initializing sim Sim-v0
Renderer: GeForce RTX 2080 Ti/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 440.33.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0221 15:25:36.362851 15005 ResourceManager.cpp:1054] Importing Basis files as BC7
I0221 15:25:37.887517 15005 Simulator.cpp:112] Loading house from data/scene_datasets/mp3d/x8F5xyUWy9e/x8F5xyUWy9e.house
I0221 15:25:37.887533 15005 Simulator.cpp:118] Loading semantic mesh data/scene_datasets/mp3d/x8F5xyUWy9e/x8F5xyUWy9e_semantic.ply
I0221 15:25:40.463873 15005 Simulator.cpp:130] Loaded.
I0221 15:25:40.496153 14913 simulator.py:142] Loaded navmesh data/scene_datasets/mp3d/x8F5xyUWy9e/x8F5xyUWy9e.navmesh
2020-02-21 15:25:40,498 Initializing task ObjectNav-v1
2020-02-21 15:25:43,125 agent number of parameters: 71371399
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/tensorflow-1.13.1-py3.6-linux-x86_64.egg/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
I0221 15:26:00.255434 15005 Simulator.cpp:35] Deconstructing Simulator
I0221 15:26:00.255450 15005 SemanticScene.h:40] Deconstructing SemanticScene
I0221 15:26:00.256119 15005 SceneManager.h:24] Deconstructing SceneManager
I0221 15:26:00.256124 15005 SceneGraph.h:20] Deconstructing SceneGraph
I0221 15:26:00.256131 15005 RenderTarget.h:51] Deconstructing RenderTarget
I0221 15:26:00.256736 15005 Sensor.h:80] Deconstructing Sensor
I0221 15:26:00.256745 15005 RenderTarget.h:51] Deconstructing RenderTarget
I0221 15:26:00.256933 15005 Sensor.h:80] Deconstructing Sensor
I0221 15:26:00.256942 15005 SceneGraph.h:20] Deconstructing SceneGraph
I0221 15:26:00.260462 15005 Renderer.cpp:33] Deconstructing Renderer
I0221 15:26:00.260471 15005 WindowlessContext.h:16] Deconstructing WindowlessContext
I0221 15:26:00.260475 15005 WindowlessContext.cpp:245] Deconstructing GL context
Renderer: GeForce RTX 2080 Ti/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 440.33.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
I0221 15:26:00.282538 15005 ResourceManager.cpp:1054] Importing Basis files as BC7
I0221 15:26:03.892053 15005 Simulator.cpp:112] Loading house from data/scene_datasets/mp3d/oLBMNvg9in8/oLBMNvg9in8.house
I0221 15:26:03.892071 15005 Simulator.cpp:118] Loading semantic mesh data/scene_datasets/mp3d/oLBMNvg9in8/oLBMNvg9in8_semantic.ply
I0221 15:26:12.726356 15005 Simulator.cpp:130] Loaded.
I0221 15:26:12.860735 14913 simulator.py:142] Loaded navmesh data/scene_datasets/mp3d/oLBMNvg9in8/oLBMNvg9in8.navmesh
2020-02-21 15:26:16,574 update: 10  fps: 42.296 
2020-02-21 15:26:16,574 update: 10  env-time: 25.018s   pth-time: 7.225s    frames: 1408
2020-02-21 15:26:16,574 Average window size 11 reward: nan
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [0,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [1,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [2,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [3,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [4,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [5,0,0] Assertion `val >= zero` failed.
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 68, in <module>
    main()
  File "habitat_baselines/run.py", line 38, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 62, in run_exp
    trainer.train()
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 300, in train
    episode_counts,
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 146, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 146, in <listcomp>
    outputs = self.envs.step([a[0].item() for a in actions])
RuntimeError: CUDA error: device-side assert triggered
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.ThreadedVectorEnv object at 0x7fad4edfb940>>
Traceback (most recent call last):
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 468, in __del__
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 350, in close
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/queue.py", line 145, in put
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/threading.py", line 347, in notify
TypeError: 'NoneType' object is not callable
shivanshpatel35 commented 4 years ago

Running it with CUDA_LAUNCH_BLOCKING=1 resulted in an elaborate error. I am copying only the last part of the log.


2020-02-21 15:44:22,225 update: 10  env-time: 23.628s   pth-time: 8.889s    frames: 1408
2020-02-21 15:44:22,225 Average window size 11 reward: nan
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [0,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [1,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [2,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [3,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [4,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [5,0,0] Assertion `val >= zero` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorScatterGather.cu line=67 error=710 : device-side assert triggered
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 68, in <module>
    main()
  File "habitat_baselines/run.py", line 38, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 62, in run_exp
    trainer.train()
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 300, in train
    episode_counts,
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 139, in _collect_rollout_step
    rollouts.masks[rollouts.step],
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/policy.py", line 50, in act
    action_log_probs = distribution.log_probs(action)
  File "/local-scratch/habitat-api1/habitat_baselines/common/utils.py", line 32, in log_probs
    .log_prob(actions.squeeze(-1))
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributions/categorical.py", line 116, in log_prob
    return log_pmf.gather(-1, value).squeeze(-1)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorScatterGather.cu:67
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.ThreadedVectorEnv object at 0x7fd5ef808940>>
Traceback (most recent call last):
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 468, in __del__
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 350, in close
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/queue.py", line 145, in put
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/threading.py", line 347, in notify
TypeError: 'NoneType' object is not callable
erikwijmans commented 4 years ago

@mathfac have you seen nan rewards with the objectnav dataset?

erikwijmans commented 4 years ago

I am not sure why reward is showing up as nan, but you are currently using just 1 process, which will make RL highly unstable and wold explain this error.

shivanshpatel35 commented 4 years ago

You are completely right. I changed LOG_INTERVAL in habitat_baselines/config/objectnav/ddppo_objectnav.yaml to 1 and it runs around 7-8 updates. After that reward turns nan.

Thanks for all the help!

mathfac commented 4 years ago

There was no NaN rewards before, but that will definitely break training. Debugging on my side.

On Mon, Mar 2, 2020 at 10:01 AM Shivansh Patel notifications@github.com wrote:

Closed #308 https://github.com/facebookresearch/habitat-api/issues/308.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/facebookresearch/habitat-api/issues/308?email_source=notifications&email_token=AAGWHAFMFOWJHACDG4IU46TRFPX6BA5CNFSM4KYXSF62YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOXAPO3NA#event-3089034676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGWHADA47TKJIKETY6JCJ3RFPX6BANCNFSM4KYXSF6Q .

StOnEGiggity commented 4 years ago

Hi, I find NaN rewards bug may occur when targets are in different large flat(ish) surfaces, which is reported in https://github.com/facebookresearch/habitat-sim/issues/405. I find one example for objectnav:

scene_id: data/scene_datasets/mp3d/oLBMNvg9in8/oLBMNvg9in8.glb
episode_id: 1336

You can check it.

srama2512 commented 4 years ago

I'm getting DistanceToGoal metric as NaN in the PointNav training on MP3D. The episode ID might be different than the standard datasets since I made some changes. So, I've given the scene id, agent position and goal position.

Episode id:  67230
Scene id: data/scene_datasets/mp3d/dhjEzFoUFzH/dhjEzFoUFzH.glb
Agent position: [ -1.4486076   -0.35115004 -34.537006  ]
Goal position: [0.5637829899787903, -0.15369552373886108, -42.55187225341797]

I haven't had the time to investigate why this is happening. Temporarily, I'm replacing geodesic distance with euclidean distance if this happens so that I can continue to train my agents.

erikwijmans commented 4 years ago

We fixed a bug in the navmeshes a while back that made a few episodes for MP3D pointnav invalid (they shouldn't have ever been valid and I have no idea how they ever were), this is the script I made for finding/removing them: https://gist.github.com/erikwijmans/e4410f0e12facb87890e919aa264e3fe -- They are just train episodes fortunately

@mathfac did the cleaned up versions ever get re-uploaded?

srama2512 commented 4 years ago

Got it. Thanks!

mathfac commented 4 years ago

@erikwijmans, thank you for bringing it up. The PointNav v1 MP3D dataset was updated with remove of that faulty episodes: https://dl.fbaipublicfiles.com/habitat/data/datasets/pointnav/mp3d/v1/pointnav_mp3d_v1.zip. cc @srama2512

srama2512 commented 4 years ago

Did the NaN rewards for ObjectNav ever get fixed? I am training DD-PPO agents for ObjectNav and it keeps detecting NaN in the DistanceToGoal metric. For a lot of episodes. Here is a small sample of the episodes where this happens:

Episode id: data/scene_datasets/mp3d/b8cTxDM8gDG/b8cTxDM8gDG.glb 16950
Episode id: data/scene_datasets/mp3d/b8cTxDM8gDG/b8cTxDM8gDG.glb 15529
Episode id: data/scene_datasets/mp3d/PX4nDJXEHrG/PX4nDJXEHrG.glb 19029
Episode id: data/scene_datasets/mp3d/sT4fr6TAbpF/sT4fr6TAbpF.glb 9722
Episode id: data/scene_datasets/mp3d/Uxmj2M2itWa/Uxmj2M2itWa.glb 111
Episode id: data/scene_datasets/mp3d/Uxmj2M2itWa/Uxmj2M2itWa.glb 2174
Episode id: data/scene_datasets/mp3d/Uxmj2M2itWa/Uxmj2M2itWa.glb 813
Episode id: data/scene_datasets/mp3d/Uxmj2M2itWa/Uxmj2M2itWa.glb 2360
Episode id: data/scene_datasets/mp3d/sT4fr6TAbpF/sT4fr6TAbpF.glb 3237
Episode id: data/scene_datasets/mp3d/sT4fr6TAbpF/sT4fr6TAbpF.glb 2914
Episode id: data/scene_datasets/mp3d/sT4fr6TAbpF/sT4fr6TAbpF.glb 11041
erikwijmans commented 4 years ago

I have never be able to reproduce nan/inf distances for ObjectNav. Did you modify the agent size/height or are you doing something like a teleport agent?

I am running a checker on those scenes to see if anything isn't navigable.

srama2512 commented 4 years ago

Oh, right. I'm using the original agent configuration and not the modified one used for the challenge. Could that be the problem?

mathfac commented 4 years ago

Yes, that can be a problem.

On Thu, Aug 27, 2020 at 10:40 AM Santhosh Kumar Ramakrishnan < notifications@github.com> wrote:

Oh, right. I'm using the original agent configuration and not the modified one used for the challenge. Could that be the problem?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/habitat-lab/issues/308#issuecomment-682093344, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGWHACA5XRXIXPSKJZFVJ3SC2LBPANCNFSM4KYXSF6Q .

mathfac commented 4 years ago

The dataset and spawn points and view points are created for specific agent specification.

On Thu, Aug 27, 2020 at 10:51 AM Oleksandr Maksymets maksymets@gmail.com wrote:

Yes, that can be a problem.

On Thu, Aug 27, 2020 at 10:40 AM Santhosh Kumar Ramakrishnan < notifications@github.com> wrote:

Oh, right. I'm using the original agent configuration and not the modified one used for the challenge. Could that be the problem?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/habitat-lab/issues/308#issuecomment-682093344, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGWHACA5XRXIXPSKJZFVJ3SC2LBPANCNFSM4KYXSF6Q .

srama2512 commented 4 years ago

Got it. Are there pre-trained DD-PPO agents available for PointNav with the new agent configuration? The mismatch in the observation space and action space (10 -> 30 deg rotations) might be a problem moving forward.