allenai / embodied-clip

Official codebase for EmbCLIP
https://arxiv.org/abs/2111.09888
Apache License 2.0
114 stars 12 forks source link

How to disable unity rendering windows? #5

Closed xuexidi closed 2 years ago

xuexidi commented 2 years ago

Thank you very much for your open source spirit! When I trying to train embclip-zeroshot) , each training process will display an unity window, such I modify the DEFAULT_NUM_TRAIN_PROCESSES to 4 processes (at embodied-clip/projects/objectnav_baselines/experiments/robothor/objectnav_robothor_base.py),it will render 4 unity windows on my screen (Ubuntu 16.04 system): Screenshot from 2022-07-17 12-46-08

I tried to set DEFAULT_THOR_IS_HEADLESS to True (at embodied-clip/projects/objectnav_baselines/experiments/objectnav_thor_base.py ), but it didn't work....

How can I disable RoboThor unity rendering windows? Because I want to migrate the training to the cloud server (with GPU but no monitor). Looking forward to your reply, thank you!

apoorvkh commented 2 years ago

Please take a look at the AllenAct Distributed ObjectNav tutorial(https://allenact.org/tutorials/distributed-objectnav-tutorial/) for more details about how to enable headless mode.

YicongHong commented 2 years ago

@xuexidi I have the same issue as I want to run the code without display on a server, did you work it out? Sorry I am very new to the AI2-THOR codebase. I tried to follow the AllenAct Distributed ObjectNav tutorial, but sorry I can't really see what/how exactly should I modify the code -- it says

"Note that this command is included in the configuration script below, so we don't need to run."

-- but I don't know where/how exactly to

"Override ObjectNavRoboThorBaseConfig's THOR_COMMIT_ID to match the installed headless one"

THOR_COMMIT_ID = "91139c909576f3bf95a187c5b02c6fd455d06b48"

and

"Indicate that we're using headless THOR:"

THOR_IS_HEADLESS = True

I tried to change DEFAULT_THOR_IS_HEADLESS to True as you did but then I got another error relating to

Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11

Could I get some help for this issue? Thanks!

apoorvkh commented 2 years ago

Hi Yicong. I can help you find this variable -- for the regular (not zero-shot) RoboTHOR ObjectNav setting, you can find THOR_COMMIT_ID at L16 of projects/objectnav_baselines/experiments/robothor/objectnav_robothor_base.py. Please try to change this and see if you still encounter the issue!

YicongHong commented 2 years ago

Hello Apoorv, Thank you so much for your guide and your time. But I ams still getting the same error when I change the line 😢

OSError: Could not find any open X-displays on which to run AI2-THOR processes.

xuexidi commented 2 years ago

@xuexidi I have the same issue as I want to run the code without display on a server, did you work it out? Sorry I am very new to the AI2-THOR codebase. I tried to follow the AllenAct Distributed ObjectNav tutorial, but sorry I can't really see what/how exactly should I modify the code -- it says

"Note that this command is included in the configuration script below, so we don't need to run."

-- but I don't know where/how exactly to

"Override ObjectNavRoboThorBaseConfig's THOR_COMMIT_ID to match the installed headless one"

THOR_COMMIT_ID = "91139c909576f3bf95a187c5b02c6fd455d06b48"

and

"Indicate that we're using headless THOR:"

THOR_IS_HEADLESS = True

I tried to change DEFAULT_THOR_IS_HEADLESS to True as you did but then I got another error relating to

Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11

Could I get some help for this issue? Thanks!

I'm sorry for the delay reply, I am very new to the AI2-THOR codebase, too. I did not successfully run the headless mode of AI2THOR at that time (I tried the headless mode under zero-shot code), so I moved all the experiments to Habitat 2020. If you have made new progress in the headless mode of AI2THOR, please also tell me, thank you!

xuexidi commented 2 years ago

Hello Apoorv, Thank you so much for your guide and your time. But I ams still getting the same error when I change the line 😢

OSError: Could not find any open X-displays on which to run AI2-THOR processes.

I meet the same issue...

YicongHong commented 2 years ago

Update: I got some progress after changing the THOR_COMMIT_ID in objectnav_robothor_base.py and set all headless options to True in embclip-allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py, it downloaded thro-CloudRendering and then it stucks there

[08/24 01:27:02 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fccb6bcb820>, 'scenes': ['FloorPlan_Train4_4', 'FloorPlan_Train6_2', 'FloorPlan_Train12_5', 'FloorPlan_Train5_3', 'FloorPlan_Train7_1', 'FloorPlan_Train3_5', 'FloorPlan_Train10_3', 'FloorPlan_Train2_2', 'FloorPlan_Train12_1', 'FloorPlan_Train11_2', 'FloorPlan_Train1_3', 'FloorPlan_Train3_1', 'FloorPlan_Train8_4', 'FloorPlan_Train9_3', 'FloorPlan_Train7_5', 'FloorPlan_Train10_1', 'FloorPlan_Train1_1', 'FloorPlan_Train8_2', 'FloorPlan_Train6_4', 'FloorPlan_Train5_5', 'FloorPlan_Train9_1', 'FloorPlan_Train7_3', 'FloorPlan_Train4_2', 'FloorPlan_Train10_5', 'FloorPlan_Train12_3', 'FloorPlan_Train2_4', 'FloorPlan_Train5_1', 'FloorPlan_Train1_5', 'FloorPlan_Train11_4', 'FloorPlan_Train3_3', 'FloorPlan_Train9_5', 'FloorPlan_Train8_1', 'FloorPlan_Train4_5', 'FloorPlan_Train6_3', 'FloorPlan_Train5_4', 'FloorPlan_Train7_2', 'FloorPlan_Train10_4', 'FloorPlan_Train4_1', 'FloorPlan_Train12_2', 'FloorPlan_Train2_3', 'FloorPlan_Train1_4', 'FloorPlan_Train11_3', 'FloorPlan_Train3_2', 'FloorPlan_Train8_5', 'FloorPlan_Train9_4', 'FloorPlan_Train10_2', 'FloorPlan_Train2_1', 'FloorPlan_Train11_1', 'FloorPlan_Train1_2', 'FloorPlan_Train8_3', 'FloorPlan_Train6_5', 'FloorPlan_Train9_2', 'FloorPlan_Train7_4', 'FloorPlan_Train4_3', 'FloorPlan_Train2_5', 'FloorPlan_Train12_4', 'FloorPlan_Train6_1', 'FloorPlan_Train11_5', 'FloorPlan_Train5_2', 'FloorPlan_Train3_4'], 'object_types': ('AlarmClock', 'Apple', 'BaseballBat', 'BasketBall', 'Bowl', 'GarbageCan', 'HousePlant', 'Laptop', 'Mug', 'SprayBottle', 'Television', 'Vase'), 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fcd620c7b20>, <allenact_plugins.ithor_plugin.ithor_sensors.GoalObjectTypeThorSensor object at 0x7fccb6bcb6d0>], 'action_space': Discrete(6), 'seed': 2108117189, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'commit_id': 'bdcefe04c17bef073ecfe3d90786e84740f7addf', 'stochastic': True, 'continuousMode': True, 'applyActionNoise': True, 'rotateStepDegrees': 30.0, 'visibilityDistance': 1.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'gpu_device': 7, 'platform': <class 'ai2thor.platform.CloudRendering'>}, 'scene_directory': '/trainman-mount/trainman-k8s-storage-c7d81357-8548-4847-8138-bb56b99fe9a5/project/embclip-allenact/datasets/robothor-objectnav/train', 'loop_dataset': True, 'allow_flipping': True, 'randomize_materials_in_training': False} [vector_sampled_tasks.py: 1035] thor-CloudRendering-22d62af5da45708e7bc40bf07a861e7397f9d2f9.zip: [|||||||||||||||||||||||||||||||||||||||||||||||||| 100% 27.8 MiB/s] of 540.MB

xuexidi commented 2 years ago

Update: I got some progress after changing the THOR_COMMIT_ID in objectnav_robothor_base.py and set all headless options to True in embclip-allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py, it downloaded thro-CloudRendering and then it stucks there

[08/24 01:27:02 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fccb6bcb820>, 'scenes': ['FloorPlan_Train4_4', 'FloorPlan_Train6_2', 'FloorPlan_Train12_5', 'FloorPlan_Train5_3', 'FloorPlan_Train7_1', 'FloorPlan_Train3_5', 'FloorPlan_Train10_3', 'FloorPlan_Train2_2', 'FloorPlan_Train12_1', 'FloorPlan_Train11_2', 'FloorPlan_Train1_3', 'FloorPlan_Train3_1', 'FloorPlan_Train8_4', 'FloorPlan_Train9_3', 'FloorPlan_Train7_5', 'FloorPlan_Train10_1', 'FloorPlan_Train1_1', 'FloorPlan_Train8_2', 'FloorPlan_Train6_4', 'FloorPlan_Train5_5', 'FloorPlan_Train9_1', 'FloorPlan_Train7_3', 'FloorPlan_Train4_2', 'FloorPlan_Train10_5', 'FloorPlan_Train12_3', 'FloorPlan_Train2_4', 'FloorPlan_Train5_1', 'FloorPlan_Train1_5', 'FloorPlan_Train11_4', 'FloorPlan_Train3_3', 'FloorPlan_Train9_5', 'FloorPlan_Train8_1', 'FloorPlan_Train4_5', 'FloorPlan_Train6_3', 'FloorPlan_Train5_4', 'FloorPlan_Train7_2', 'FloorPlan_Train10_4', 'FloorPlan_Train4_1', 'FloorPlan_Train12_2', 'FloorPlan_Train2_3', 'FloorPlan_Train1_4', 'FloorPlan_Train11_3', 'FloorPlan_Train3_2', 'FloorPlan_Train8_5', 'FloorPlan_Train9_4', 'FloorPlan_Train10_2', 'FloorPlan_Train2_1', 'FloorPlan_Train11_1', 'FloorPlan_Train1_2', 'FloorPlan_Train8_3', 'FloorPlan_Train6_5', 'FloorPlan_Train9_2', 'FloorPlan_Train7_4', 'FloorPlan_Train4_3', 'FloorPlan_Train2_5', 'FloorPlan_Train12_4', 'FloorPlan_Train6_1', 'FloorPlan_Train11_5', 'FloorPlan_Train5_2', 'FloorPlan_Train3_4'], 'object_types': ('AlarmClock', 'Apple', 'BaseballBat', 'BasketBall', 'Bowl', 'GarbageCan', 'HousePlant', 'Laptop', 'Mug', 'SprayBottle', 'Television', 'Vase'), 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fcd620c7b20>, <allenact_plugins.ithor_plugin.ithor_sensors.GoalObjectTypeThorSensor object at 0x7fccb6bcb6d0>], 'action_space': Discrete(6), 'seed': 2108117189, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'commit_id': 'bdcefe04c17bef073ecfe3d90786e84740f7addf', 'stochastic': True, 'continuousMode': True, 'applyActionNoise': True, 'rotateStepDegrees': 30.0, 'visibilityDistance': 1.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'gpu_device': 7, 'platform': <class 'ai2thor.platform.CloudRendering'>}, 'scene_directory': '/trainman-mount/trainman-k8s-storage-c7d81357-8548-4847-8138-bb56b99fe9a5/project/embclip-allenact/datasets/robothor-objectnav/train', 'loop_dataset': True, 'allow_flipping': True, 'randomize_materials_in_training': False} [vector_sampled_tasks.py: 1035] thor-CloudRendering-22d62af5da45708e7bc40bf07a861e7397f9d2f9.zip: [|||||||||||||||||||||||||||||||||||||||||||||||||| 100% 27.8 MiB/s] of 540.MB

Same situation. at this time, check whether the CPU utilization rate is nearly 0%? I guess it's the wrong setting that causes the program process to jam.

YicongHong commented 2 years ago

After some tracing, I found the code hangs here https://github.com/allenai/embodied-clip/blob/855fe741ddecaaf2492966d52d3f130e58d3c48f/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py#L213 which I guess it is still related to rendering views.

apoorvkh commented 2 years ago

OSError: Could not find any open X-displays on which to run AI2-THOR processes

You're going to need to also start x-displays. Please see more details here.

sudo python scripts/startx.py &
YicongHong commented 2 years ago

Hello Apoorv, Thanks for keep tracing this issue for us. Yesterday I was hoping to run the code without X-display at all because enabling X-display is really a pain on my servers, but it seems that it is very hard to find a way around. Today I finally enabled the X-display on my machine, now I come back to re-run the code but now it hangs at a different place with my keyboard irresponsive in the terminal and 100% CPU utilization

(embclip-allenact) yhong@xxxxxx:~/test_projects/embclip-allenact$ PYTHONPATH=. python allenact/main.py -o storage/objectnav-robothor-rgb-clip-rn50 -b projects/objectnav_baselines/experiments/robothor/clip objectnav_robothor_rgb_clipresnet50gru_ddppo [08/24 22:33:52 INFO:] Running with args Namespace(approx_ckpt_step_interval=None, approx_ckpt_steps_count=None, checkpoint=None, collect_valid_results=False, config_kwargs=None, deterministic_agents=False, deterministic_cudnn=False, disable_config_saving=False, disable_tensorboard=False, distributed_ip_and_port='127.0.0.1:0', eval=False, experiment='objectnav_robothor_rgb_clipresnet50gru_ddppo', experiment_base='projects/objectnav_baselines/experiments/robothor/clip', extra_tag='', infer_output_dir=False, log_level='info', machine_id=0, max_sampler_processes_per_worker=None, output_dir='storage/objectnav-robothor-rgb-clip-rn50', restart_pipeline=False, save_dir_fmt=<SaveDirFormat.FLAT: 'FLAT'>, seed=None, skip_checkpoints=0, test_date=None, test_expert=False) [main.py: 420] [08/24 22:33:56 INFO:] Git diff saved to storage/objectnav-robothor-rgb-clip-rn50/used_configs/ObjectNav-RoboTHOR-RGB-ClipResNet50GRU-DDPPO/2022-08-24_22-33-56 [runner.py: 754] [08/24 22:33:56 INFO:] Config files saved to storage/objectnav-robothor-rgb-clip-rn50/used_configs/ObjectNav-RoboTHOR-RGB-ClipResNet50GRU-DDPPO/2022-08-24_22-33-56 [runner.py: 798] [08/24 22:33:56 INFO:] Using 1 train workers on devices (device(type='cuda', index=7),) [runner.py: 239]

Will keep trying to run the code. Cheers.

YicongHong commented 2 years ago

Ok I finally get it running, the issue is from my side, should be pretty easy with x-display enabled. 😅 Thanks angain Apoorv for sharing this wonderful code and easy installation, will start building my work based on your paper. Cheers!

apoorvkh commented 2 years ago

Glad you were able to get training working! I will add details about AllenAct headless mode to my instructions :) Looking forward to seeing your work.