decisionforce / HACO

[ICLR 2022] Official implementation of paper: Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization
Apache License 2.0
41 stars 10 forks source link

An error about cluster resources when begining a quick shart #3

Open Damon328 opened 2 years ago

Damon328 commented 2 years ago

hello, congratulations to amazing work!

I get following error when begin a Quick Start. Then , I reinstall cuda 10.1 and cudnn 7.6.5 , confirm tensorflow can use GPU, but none of this is useful.

systerm information: OS: windows 10 python:3.7

error

 E:HACO\haco\run_main_exp>python train_haco_keyboard_easy.py --num-gpus=1
WARNING:tensorflow:From D:\Anaconda\envs\haco\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term

Successfully registered the following environments: ['MetaDrive-validation-v0', 'MetaDrive-10env-v0', 'MetaDrive-100envs-v0', 'MetaDrive-1000envs-v0', 'SafeMetaDrive-validation-v0', 'SafeMetaDrive-10env-v0', 'SafeMetaDrive-100envs-v0', 'SafeMetaDrive-1000envs-v0', 'MARLTollgate-v0', 'MARLBottleneck-v0', 'MARLRoundabout-v0', 'MARLIntersection-v0', 'MARLParkingLot-v0', 'MARLMetaDrive-v0'].

Successfully initialize Ray!
Available resources:  {}

Traceback (most recent call last):
  File "train_haco_keyboard_easy.py", line 66, in <module>
    custom_callback=HACOCallbacks,
  File "e:\baidu\haco\haco\utils\train.py", line 103, in train
    **kwargs
  File "D:\Anaconda\envs\haco\lib\site-packages\ray\tune\tune.py", line 405, in run
    runner.step()
  File "D:\Anaconda\envs\haco\lib\site-packages\ray\tune\trial_runner.py", line 377, in step
    self.trial_executor.on_no_available_trials(self)
  File "D:\Anaconda\envs\haco\lib\site-packages\ray\tune\trial_executor.py", line 177, in on_no_available_trials
    "Insufficient cluster resources to launch trial: "

ray.tune.error.TuneError: Insufficient cluster resources to launch trial: trial requested 0.5 CPUs, 0.2 GPUs, but the cluster has only 0 CPUs, 0 GPUs, 0.0 GiB heap, 0.0 GiB objects.

You can adjust the resource requests of RLlib agents by setting `num_workers`, `num_gpus`, and other configs. See the DEFAULT_CONFIG defined by each agent for more info.

The config of this agent is: {'seed': 0, 'log_level': 'INFO', 'callbacks': <class 'haco.utils.callback.HACOCallbacks'>, 'env': <class 'haco.utils.human_in_the_loop_env.HumanInTheLoopEnv'>, 'env_config': {'manual_control': True, 'use_render': True, 'controller': 'keyboard', 'window_size': (1600, 1100), 'cos_similarity': True, 'map': 'COT', 'environment_num': 1}, 'takeover_data_discard': False, 'twin_cost_q': True, 'alpha': 10, 'no_reward': True, 'explore': True, 'optimization': {'actor_learning_rate': 0.0001, 'critic_learning_rate': 0.0001, 'entropy_learning_rate': 0.0001}, 'prioritized_replay': False, 'horizon': 1000, 'target_network_update_freq': 1, 'timesteps_per_iteration': 100, 'metrics_smoothing_episodes': 10, 'learning_starts': 100, 'clip_actions': False, 'train_batch_size': 1024, 'normalize_actions': True, 'num_cpus_for_driver': 0.5, 'num_cpus_per_worker': 0.1, 'num_gpus': 0.2}
Snipaste_2022-08-10_16-07-30
pengzhenghao commented 1 year ago

Sorry for late reply. It seems that the CUDA is not properly installed in your python environment. Is this issue still on?