facebookresearch / habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.
https://aihabitat.org/
MIT License
1.93k stars 483 forks source link

Default tensorboard dir for objectnav is sudo protected, causing an error #403

Closed alexcdot closed 4 years ago

alexcdot commented 4 years ago

🐛 Bug

Command

bash habitat_baselines/rl/ddppo/single_node.sh gives an error that it has denied permission to create directory for tensorboard logs in /data/logs/objectnav_mp3d which is in the root directory, so its protected by admin privileges. Seems like it would make much more sense to change the save location to be data/logs/objectnav_mp3d

Line here: https://github.com/facebookresearch/habitat-api/blob/c3d52b15c83efbdfb0dd3734e2a90d050ba53c84/habitat_baselines/config/objectnav/ddppo_objectnav.yaml#L8

To Reproduce

Steps to reproduce the behavior:

  1. bash habitat_baselines/rl/ddppo/single_node.sh

specifically with the --exp-config line set to habitat_baselines/config/objectnav/ddppo_objectnav.yaml


2020-05-18 09:21:39,439 initializing sim Sim-v0
2020-05-18 09:21:43,517 Initializing task ObjectNav-v1
2020-05-18 09:21:44,181 Overwriting CNN input size of depth: (256, 256)
2020-05-18 09:21:44,183 Overwriting CNN input size of rgb: (256, 256)
2020-05-18 09:21:47,403 agent number of trainable parameters: 12592519
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 70, in <module>
    main()
  File "habitat_baselines/run.py", line 39, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 64, in run_exp
    trainer.train()
  File "/home/ubuntu/cs148/habitat-api/habitat_baselines/rl/ddppo/algo/ddppo_trainer.py", line 278, in train
    if self.world_rank == 0
  File "/home/ubuntu/cs148/habitat-api/habitat_baselines/common/tensorboard_utils.py", line28, in __init__
    self.writer = SummaryWriter(log_dir, *args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 225, in __init__
    self._get_file_writer()
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 256, in _get_file_writer
    self.flush_secs, self.filename_suffix)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 66, in __init__
    log_dir, max_queue, flush_secs, filename_suffix)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/site-packages/tensorboard/summary/writer/event_file_writer.py", line 73, in __init__
    os.makedirs(logdir)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/os.py", line 221, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/data'
Exception ignored in: <function VectorEnv.__del__ at 0x7f5e749725f0>
Traceback (most recent call last):
  File "/home/ubuntu/cs148/habitat-api/habitat/core/vector_env.py", line 518, in __del__
    self.close()
  File "/home/ubuntu/cs148/habitat-api/habitat/core/vector_env.py", line 400, in close
    write_fn((CLOSE_COMMAND, None))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/pytorch_p36/bin/python', '-u', 'habitat_baselines/run.py', '--exp-config', 'habitat_baselines/config/objectnav/ddppo_objectnav.yaml', '--run-type', 'train', 'TASK_CONFIG.DATASET.SPLIT', 'train']' died with<Signals.SIGSEGV: 11>.```

## Expected behavior

<!-- A clear and concise description of what you expected to happen. -->

No permissions denied error
dhruvbatra commented 4 years ago

CC: @mathfac