Closed xuexidi closed 2 years ago
Please take a look at the AllenAct Distributed ObjectNav tutorial
(https://allenact.org/tutorials/distributed-objectnav-tutorial/) for more details about how to enable headless mode.
@xuexidi I have the same issue as I want to run the code without display on a server, did you work it out? Sorry I am very new to the AI2-THOR codebase. I tried to follow the AllenAct Distributed ObjectNav tutorial, but sorry I can't really see what/how exactly should I modify the code -- it says
"Note that this command is included in the configuration script below, so we don't need to run."
-- but I don't know where/how exactly to
"Override ObjectNavRoboThorBaseConfig's THOR_COMMIT_ID to match the installed headless one"
THOR_COMMIT_ID = "91139c909576f3bf95a187c5b02c6fd455d06b48"
and
"Indicate that we're using headless THOR:"
THOR_IS_HEADLESS = True
I tried to change DEFAULT_THOR_IS_HEADLESS to True as you did but then I got another error relating to
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11
Could I get some help for this issue? Thanks!
Hi Yicong. I can help you find this variable -- for the regular (not zero-shot) RoboTHOR ObjectNav setting, you can find THOR_COMMIT_ID
at L16 of projects/objectnav_baselines/experiments/robothor/objectnav_robothor_base.py
. Please try to change this and see if you still encounter the issue!
Hello Apoorv, Thank you so much for your guide and your time. But I ams still getting the same error when I change the line 😢
OSError: Could not find any open X-displays on which to run AI2-THOR processes.
@xuexidi I have the same issue as I want to run the code without display on a server, did you work it out? Sorry I am very new to the AI2-THOR codebase. I tried to follow the AllenAct Distributed ObjectNav tutorial, but sorry I can't really see what/how exactly should I modify the code -- it says
"Note that this command is included in the configuration script below, so we don't need to run."
-- but I don't know where/how exactly to
"Override ObjectNavRoboThorBaseConfig's THOR_COMMIT_ID to match the installed headless one"
THOR_COMMIT_ID = "91139c909576f3bf95a187c5b02c6fd455d06b48"
and
"Indicate that we're using headless THOR:"
THOR_IS_HEADLESS = True
I tried to change DEFAULT_THOR_IS_HEADLESS to True as you did but then I got another error relating to
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11
Could I get some help for this issue? Thanks!
I'm sorry for the delay reply, I am very new to the AI2-THOR codebase, too. I did not successfully run the headless mode of AI2THOR at that time (I tried the headless mode under zero-shot code), so I moved all the experiments to Habitat 2020. If you have made new progress in the headless mode of AI2THOR, please also tell me, thank you!
Hello Apoorv, Thank you so much for your guide and your time. But I ams still getting the same error when I change the line 😢
OSError: Could not find any open X-displays on which to run AI2-THOR processes.
I meet the same issue...
Update: I got some progress after changing the THOR_COMMIT_ID
in objectnav_robothor_base.py
and set all headless options to True in embclip-allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py
, it downloaded thro-CloudRendering and then it stucks there
[08/24 01:27:02 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fccb6bcb820>, 'scenes': ['FloorPlan_Train4_4', 'FloorPlan_Train6_2', 'FloorPlan_Train12_5', 'FloorPlan_Train5_3', 'FloorPlan_Train7_1', 'FloorPlan_Train3_5', 'FloorPlan_Train10_3', 'FloorPlan_Train2_2', 'FloorPlan_Train12_1', 'FloorPlan_Train11_2', 'FloorPlan_Train1_3', 'FloorPlan_Train3_1', 'FloorPlan_Train8_4', 'FloorPlan_Train9_3', 'FloorPlan_Train7_5', 'FloorPlan_Train10_1', 'FloorPlan_Train1_1', 'FloorPlan_Train8_2', 'FloorPlan_Train6_4', 'FloorPlan_Train5_5', 'FloorPlan_Train9_1', 'FloorPlan_Train7_3', 'FloorPlan_Train4_2', 'FloorPlan_Train10_5', 'FloorPlan_Train12_3', 'FloorPlan_Train2_4', 'FloorPlan_Train5_1', 'FloorPlan_Train1_5', 'FloorPlan_Train11_4', 'FloorPlan_Train3_3', 'FloorPlan_Train9_5', 'FloorPlan_Train8_1', 'FloorPlan_Train4_5', 'FloorPlan_Train6_3', 'FloorPlan_Train5_4', 'FloorPlan_Train7_2', 'FloorPlan_Train10_4', 'FloorPlan_Train4_1', 'FloorPlan_Train12_2', 'FloorPlan_Train2_3', 'FloorPlan_Train1_4', 'FloorPlan_Train11_3', 'FloorPlan_Train3_2', 'FloorPlan_Train8_5', 'FloorPlan_Train9_4', 'FloorPlan_Train10_2', 'FloorPlan_Train2_1', 'FloorPlan_Train11_1', 'FloorPlan_Train1_2', 'FloorPlan_Train8_3', 'FloorPlan_Train6_5', 'FloorPlan_Train9_2', 'FloorPlan_Train7_4', 'FloorPlan_Train4_3', 'FloorPlan_Train2_5', 'FloorPlan_Train12_4', 'FloorPlan_Train6_1', 'FloorPlan_Train11_5', 'FloorPlan_Train5_2', 'FloorPlan_Train3_4'], 'object_types': ('AlarmClock', 'Apple', 'BaseballBat', 'BasketBall', 'Bowl', 'GarbageCan', 'HousePlant', 'Laptop', 'Mug', 'SprayBottle', 'Television', 'Vase'), 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fcd620c7b20>, <allenact_plugins.ithor_plugin.ithor_sensors.GoalObjectTypeThorSensor object at 0x7fccb6bcb6d0>], 'action_space': Discrete(6), 'seed': 2108117189, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'commit_id': 'bdcefe04c17bef073ecfe3d90786e84740f7addf', 'stochastic': True, 'continuousMode': True, 'applyActionNoise': True, 'rotateStepDegrees': 30.0, 'visibilityDistance': 1.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'gpu_device': 7, 'platform': <class 'ai2thor.platform.CloudRendering'>}, 'scene_directory': '/trainman-mount/trainman-k8s-storage-c7d81357-8548-4847-8138-bb56b99fe9a5/project/embclip-allenact/datasets/robothor-objectnav/train', 'loop_dataset': True, 'allow_flipping': True, 'randomize_materials_in_training': False} [vector_sampled_tasks.py: 1035] thor-CloudRendering-22d62af5da45708e7bc40bf07a861e7397f9d2f9.zip: [|||||||||||||||||||||||||||||||||||||||||||||||||| 100% 27.8 MiB/s] of 540.MB
Update: I got some progress after changing the
THOR_COMMIT_ID
inobjectnav_robothor_base.py
and set all headless options to True inembclip-allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py
, it downloaded thro-CloudRendering and then it stucks there[08/24 01:27:02 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fccb6bcb820>, 'scenes': ['FloorPlan_Train4_4', 'FloorPlan_Train6_2', 'FloorPlan_Train12_5', 'FloorPlan_Train5_3', 'FloorPlan_Train7_1', 'FloorPlan_Train3_5', 'FloorPlan_Train10_3', 'FloorPlan_Train2_2', 'FloorPlan_Train12_1', 'FloorPlan_Train11_2', 'FloorPlan_Train1_3', 'FloorPlan_Train3_1', 'FloorPlan_Train8_4', 'FloorPlan_Train9_3', 'FloorPlan_Train7_5', 'FloorPlan_Train10_1', 'FloorPlan_Train1_1', 'FloorPlan_Train8_2', 'FloorPlan_Train6_4', 'FloorPlan_Train5_5', 'FloorPlan_Train9_1', 'FloorPlan_Train7_3', 'FloorPlan_Train4_2', 'FloorPlan_Train10_5', 'FloorPlan_Train12_3', 'FloorPlan_Train2_4', 'FloorPlan_Train5_1', 'FloorPlan_Train1_5', 'FloorPlan_Train11_4', 'FloorPlan_Train3_3', 'FloorPlan_Train9_5', 'FloorPlan_Train8_1', 'FloorPlan_Train4_5', 'FloorPlan_Train6_3', 'FloorPlan_Train5_4', 'FloorPlan_Train7_2', 'FloorPlan_Train10_4', 'FloorPlan_Train4_1', 'FloorPlan_Train12_2', 'FloorPlan_Train2_3', 'FloorPlan_Train1_4', 'FloorPlan_Train11_3', 'FloorPlan_Train3_2', 'FloorPlan_Train8_5', 'FloorPlan_Train9_4', 'FloorPlan_Train10_2', 'FloorPlan_Train2_1', 'FloorPlan_Train11_1', 'FloorPlan_Train1_2', 'FloorPlan_Train8_3', 'FloorPlan_Train6_5', 'FloorPlan_Train9_2', 'FloorPlan_Train7_4', 'FloorPlan_Train4_3', 'FloorPlan_Train2_5', 'FloorPlan_Train12_4', 'FloorPlan_Train6_1', 'FloorPlan_Train11_5', 'FloorPlan_Train5_2', 'FloorPlan_Train3_4'], 'object_types': ('AlarmClock', 'Apple', 'BaseballBat', 'BasketBall', 'Bowl', 'GarbageCan', 'HousePlant', 'Laptop', 'Mug', 'SprayBottle', 'Television', 'Vase'), 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fcd620c7b20>, <allenact_plugins.ithor_plugin.ithor_sensors.GoalObjectTypeThorSensor object at 0x7fccb6bcb6d0>], 'action_space': Discrete(6), 'seed': 2108117189, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'commit_id': 'bdcefe04c17bef073ecfe3d90786e84740f7addf', 'stochastic': True, 'continuousMode': True, 'applyActionNoise': True, 'rotateStepDegrees': 30.0, 'visibilityDistance': 1.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'gpu_device': 7, 'platform': <class 'ai2thor.platform.CloudRendering'>}, 'scene_directory': '/trainman-mount/trainman-k8s-storage-c7d81357-8548-4847-8138-bb56b99fe9a5/project/embclip-allenact/datasets/robothor-objectnav/train', 'loop_dataset': True, 'allow_flipping': True, 'randomize_materials_in_training': False} [vector_sampled_tasks.py: 1035] thor-CloudRendering-22d62af5da45708e7bc40bf07a861e7397f9d2f9.zip: [|||||||||||||||||||||||||||||||||||||||||||||||||| 100% 27.8 MiB/s] of 540.MB
Same situation. at this time, check whether the CPU utilization rate is nearly 0%? I guess it's the wrong setting that causes the program process to jam.
After some tracing, I found the code hangs here https://github.com/allenai/embodied-clip/blob/855fe741ddecaaf2492966d52d3f130e58d3c48f/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py#L213 which I guess it is still related to rendering views.
OSError: Could not find any open X-displays on which to run AI2-THOR processes
You're going to need to also start x-displays. Please see more details here.
sudo python scripts/startx.py &
Hello Apoorv, Thanks for keep tracing this issue for us. Yesterday I was hoping to run the code without X-display at all because enabling X-display is really a pain on my servers, but it seems that it is very hard to find a way around. Today I finally enabled the X-display on my machine, now I come back to re-run the code but now it hangs at a different place with my keyboard irresponsive in the terminal and 100% CPU utilization
(embclip-allenact) yhong@xxxxxx:~/test_projects/embclip-allenact$ PYTHONPATH=. python allenact/main.py -o storage/objectnav-robothor-rgb-clip-rn50 -b projects/objectnav_baselines/experiments/robothor/clip objectnav_robothor_rgb_clipresnet50gru_ddppo [08/24 22:33:52 INFO:] Running with args Namespace(approx_ckpt_step_interval=None, approx_ckpt_steps_count=None, checkpoint=None, collect_valid_results=False, config_kwargs=None, deterministic_agents=False, deterministic_cudnn=False, disable_config_saving=False, disable_tensorboard=False, distributed_ip_and_port='127.0.0.1:0', eval=False, experiment='objectnav_robothor_rgb_clipresnet50gru_ddppo', experiment_base='projects/objectnav_baselines/experiments/robothor/clip', extra_tag='', infer_output_dir=False, log_level='info', machine_id=0, max_sampler_processes_per_worker=None, output_dir='storage/objectnav-robothor-rgb-clip-rn50', restart_pipeline=False, save_dir_fmt=<SaveDirFormat.FLAT: 'FLAT'>, seed=None, skip_checkpoints=0, test_date=None, test_expert=False) [main.py: 420] [08/24 22:33:56 INFO:] Git diff saved to storage/objectnav-robothor-rgb-clip-rn50/used_configs/ObjectNav-RoboTHOR-RGB-ClipResNet50GRU-DDPPO/2022-08-24_22-33-56 [runner.py: 754] [08/24 22:33:56 INFO:] Config files saved to storage/objectnav-robothor-rgb-clip-rn50/used_configs/ObjectNav-RoboTHOR-RGB-ClipResNet50GRU-DDPPO/2022-08-24_22-33-56 [runner.py: 798] [08/24 22:33:56 INFO:] Using 1 train workers on devices (device(type='cuda', index=7),) [runner.py: 239]
Will keep trying to run the code. Cheers.
Ok I finally get it running, the issue is from my side, should be pretty easy with x-display enabled. 😅 Thanks angain Apoorv for sharing this wonderful code and easy installation, will start building my work based on your paper. Cheers!
Glad you were able to get training working! I will add details about AllenAct headless mode to my instructions :) Looking forward to seeing your work.
Thank you very much for your open source spirit! When I trying to train embclip-zeroshot) , each training process will display an unity window, such I modify the
DEFAULT_NUM_TRAIN_PROCESSES
to 4 processes (atembodied-clip/projects/objectnav_baselines/experiments/robothor/objectnav_robothor_base.py
),it will render 4 unity windows on my screen (Ubuntu 16.04 system):I tried to set
DEFAULT_THOR_IS_HEADLESS
toTrue
(atembodied-clip/projects/objectnav_baselines/experiments/objectnav_thor_base.py
), but it didn't work....How can I disable RoboThor unity rendering windows? Because I want to migrate the training to the cloud server (with GPU but no monitor). Looking forward to your reply, thank you!