fuxiAIlab / RL4RS

A Real-World Benchmark for Reinforcement Learning based Recommender System
Creative Commons Attribution Share Alike 4.0 International
220 stars 26 forks source link

Get error when run bash run_modelfree_rl.sh DQN/PPO/DDPG/PG/PG_conti #9

Open Larry-Liu02 opened 8 months ago

Larry-Liu02 commented 8 months ago

Dear RL4RS Team,

When I run the nohup python -u rl4rs/server/gymHttpServer.py & bash run_modelfree_rl.sh DQN, it always appears the ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000), the same as I run the tutorial.ipynb's last cell. I don't know the reason. Is that related to my local network? I can connect to the Mainland Internet. I also tried using a cloud server, which shows the same error. I wonder to know how to solve it. I'm sharing the complete error information here. Many thanks!!!

/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via pip install 'ray[default]'. Please update your install command. "update your install command.", FutureWarning) 2024-03-11 15:06:36,349 INFO services.py:1247 -- View the Ray dashboard at http://127.0.0.1:8265 2024-03-11 15:06:37,418 INFO trainer.py:706 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution

{'epoch': 5, 'maxlen': 64, 'batch_size': 64, 'action_size': 284, 'class_num': 2, 'dense_feature_num': 432, 'category_feature_num': 21, 'category_hash_size': 100000, 'seq_num': 2, 'emb_size': 128, 'is_eval': False, 'hidden_units': 128, 'max_steps': 9, 'action_emb_size': 32, 'sample_file': 'simulator/rl4rs_dataset_a_shuf.csv', 'model_file': 'simulator/finetuned/simulator_a_dien/model', 'iteminfo_file': 'raw_data/item_info.csv', 'remote_base': 'http://127.0.0.1:5000', 'trial_name': 'all', 'support_rllib_mask': True, 'env': 'SlateRecEnv-v0'} rllib_config {'env': 'rllibEnv-v0', 'gamma': 1, 'explore': True, 'exploration_config': {'type': 'SoftQ'}, 'num_gpus': 1, 'num_workers': 2, 'framework': 'tf', 'rollout_fragment_length': 9, 'batch_mode': 'complete_episodes', 'train_batch_size': 576, 'evaluation_interval': 1, 'evaluation_num_episodes': 8192, 'evaluation_config': {'explore': False}, 'log_level': 'INFO', 'use_critic': True, 'use_gae': True, 'lambda': 1.0, 'kl_coeff': 0.2, 'sgd_minibatch_size': 256, 'shuffle_sequences': True, 'num_sgd_iter': 1, 'lr': 0.0001, 'vf_loss_coeff': 0.5, 'clip_param': 0.3, 'vf_clip_param': 500.0, 'kl_target': 0.01}

(pid=41960) 2024-03-11 15:06:38,687 ERROR worker.py:421 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=41960, ip=192.168.1.4) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection (pid=41960) raise err (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 86, in create_connection (pid=41960) sock.connect(sa) (pid=41960) ConnectionRefusedError: [Errno 111] Connection refused (pid=41960) (pid=41960) During handling of the above exception, another exception occurred: (pid=41960) (pid=41960) ray::RolloutWorker.init() (pid=41960, ip=192.168.1.4) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen (pid=41960) chunked=chunked, (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 394, in _make_request (pid=41960) conn.request(method, url, httplib_request_kw) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 234, in request (pid=41960) super(HTTPConnection, self).request(method, url, body=body, headers=headers) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1287, in request (pid=41960) self._send_request(method, url, body, headers, encode_chunked) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1333, in _send_request (pid=41960) self.endheaders(body, encode_chunked=encode_chunked) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1282, in endheaders (pid=41960) self._send_output(message_body, encode_chunked=encode_chunked) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1042, in _send_output (pid=41960) self.send(msg) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 980, in send (pid=41960) self.connect() (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 200, in connect (pid=41960) conn = self._new_conn() (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn (pid=41960) self, "Failed to establish a new connection: %s" % e (pid=41960) urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f21554e6518>: Failed to establish a new connection: [Errno 111] Connection refused (pid=41960) (pid=41960) During handling of the above exception, another exception occurred: (pid=41960) (pid=41960) ray::RolloutWorker.init() (pid=41960, ip=192.168.1.4) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 450, in send (pid=41960) timeout=timeout (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen (pid=41960) method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment (pid=41960) raise MaxRetryError(_pool, url, error or ResponseError(cause)) (pid=41960) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f21554e6518>: Failed to establish a new connection: [Errno 111] Connection refused',)) (pid=41960) (pid=41960) During handling of the above exception, another exception occurred: (pid=41960) (pid=41960) ray::RolloutWorker.init() (pid=41960, ip=192.168.1.4) (pid=41960) File "python/ray/_raylet.pyx", line 523, in ray._raylet.execute_task (pid=41960) File "python/ray/_raylet.pyx", line 530, in ray._raylet.execute_task (pid=41960) File "python/ray/_raylet.pyx", line 534, in ray._raylet.execute_task (pid=41960) File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task.function_executor (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/_private/function_manager.py", line 563, in actor_method_executor (pid=41960) return method(ray_actor, *args, **kwargs) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 392, in init (pid=41960) self.env = env_creator(env_context) (pid=41960) File "", line 43, in (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 184, in make (pid=41960) return registry.make(id, kwargs) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 106, in make (pid=41960) env = spec.make(kwargs) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 76, in make (pid=41960) env = cls(_kwargs) (pid=41960) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/httpEnv.py", line 12, in init (pid=41960) self.instance_id = self.client.env_create(env_id, config) (pid=41960) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 55, in env_create (pid=41960) resp = self._post_request(route, data) (pid=41960) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 43, in _post_request (pid=41960) data=json.dumps(data)) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 577, in post (pid=41960) return self.request('POST', url, data=data, json=json, kwargs) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 529, in request (pid=41960) resp = self.send(prep, send_kwargs) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 645, in send (pid=41960) r = adapter.send(request, kwargs) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 519, in send (pid=41960) raise ConnectionError(e, request=request) (pid=41960) requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f21554e6518>: Failed to establish a new connection: [Errno 111] Connection refused',)) (pid=41942) 2024-03-11 15:06:38,686 ERROR worker.py:421 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=41942, ip=192.168.1.4) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection (pid=41942) raise err (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 86, in create_connection (pid=41942) sock.connect(sa) (pid=41942) ConnectionRefusedError: [Errno 111] Connection refused (pid=41942) (pid=41942) During handling of the above exception, another exception occurred: (pid=41942) (pid=41942) ray::RolloutWorker.init() (pid=41942, ip=192.168.1.4) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen (pid=41942) chunked=chunked, (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 394, in _make_request (pid=41942) conn.request(method, url, **httplib_request_kw) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 234, in request (pid=41942) super(HTTPConnection, self).request(method, url, body=body, headers=headers) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1287, in request (pid=41942) self._send_request(method, url, body, headers, encode_chunked) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1333, in _send_request (pid=41942) self.endheaders(body, encode_chunked=encode_chunked) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1282, in endheaders (pid=41942) self._send_output(message_body, encode_chunked=encode_chunked) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1042, in _send_output (pid=41942) self.send(msg) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 980, in send (pid=41942) self.connect() (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 200, in connect (pid=41942) conn = self._new_conn() (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn (pid=41942) self, "Failed to establish a new connection: %s" % e (pid=41942) urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7ef8abef6550>: Failed to establish a new connection: [Errno 111] Connection refused (pid=41942) (pid=41942) During handling of the above exception, another exception occurred: (pid=41942) (pid=41942) ray::RolloutWorker.init() (pid=41942, ip=192.168.1.4) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 450, in send (pid=41942) timeout=timeout (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen (pid=41942) method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment (pid=41942) raise MaxRetryError(_pool, url, error or ResponseError(cause)) (pid=41942) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ef8abef6550>: Failed to establish a new connection: [Errno 111] Connection refused',)) (pid=41942) (pid=41942) During handling of the above exception, another exception occurred: (pid=41942) (pid=41942) ray::RolloutWorker.init__() (pid=41942, ip=192.168.1.4) (pid=41942) File "python/ray/_raylet.pyx", line 523, in ray._raylet.execute_task (pid=41942) File "python/ray/_raylet.pyx", line 530, in ray._raylet.execute_task (pid=41942) File "python/ray/_raylet.pyx", line 534, in ray._raylet.execute_task (pid=41942) File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task.function_executor (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/_private/function_manager.py", line 563, in actor_method_executor (pid=41942) return method(__ray_actor, *args, kwargs) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 392, in init (pid=41942) self.env = env_creator(env_context) (pid=41942) File "", line 43, in (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 184, in make (pid=41942) return registry.make(id, kwargs) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 106, in make (pid=41942) env = spec.make(kwargs) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 76, in make (pid=41942) env = cls(_kwargs) (pid=41942) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/httpEnv.py", line 12, in init (pid=41942) self.instance_id = self.client.env_create(env_id, config) (pid=41942) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 55, in env_create (pid=41942) resp = self._post_request(route, data) (pid=41942) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 43, in _post_request (pid=41942) data=json.dumps(data)) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 577, in post (pid=41942) return self.request('POST', url, data=data, json=json, kwargs) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 529, in request (pid=41942) resp = self.send(prep, send_kwargs) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 645, in send (pid=41942) r = adapter.send(request, kwargs) (pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 519, in send (pid=41942) raise ConnectionError(e, request=request) (pid=41942) requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ef8abef6550>: Failed to establish a new connection: [Errno 111] Connection refused',))


RayActorError Traceback (most recent call last)

in 83 **cfg) 84 print('rllib_config', rllib_config) ---> 85 trainer = get_rl_model(algo, rllib_config) 86 87 # restore_file = '' /media/kemove/16T/Jupyter/Electronics/script/modelfree_trainer.py in get_rl_model(algo, rllib_config) 12 trainer = None 13 if algo == "PPO": ---> 14 trainer = ppo.PPOTrainer(config=rllib_config, env="rllibEnv-v0") 15 elif algo == "DQN": 16 trainer = dqn.DQNTrainer(config=rllib_config, env="rllibEnv-v0") ~/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py in __init__(self, config, env, logger_creator) 121 122 def __init__(self, config=None, env=None, logger_creator=None): --> 123 Trainer.__init__(self, config, env, logger_creator) 124 125 def _init(self, config: TrainerConfigDict, ~/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/agents/trainer.py in __init__(self, config, env, logger_creator) 582 logger_creator = default_logger_creator 583 --> 584 super().__init__(config, logger_creator) 585 586 @classmethod ~/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/tune/trainable.py in __init__(self, config, logger_creator) 101 102 start_time = time.time() --> 103 self.setup(copy.deepcopy(self.config)) 104 setup_time = time.time() - start_time 105 if setup_time > SETUP_TIME_THRESHOLD: ~/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/agents/trainer.py in setup(self, config) 729 730 with get_scope(): --> 731 self._init(self.config, self.env_creator) 732 733 # Evaluation setup. ~/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py in _init(self, config, env_creator) 150 policy_class=self._policy_class, 151 config=config, --> 152 num_workers=self.config["num_workers"]) 153 self.execution_plan = execution_plan 154 self.train_exec_impl = execution_plan(self.workers, config) ~/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/agents/trainer.py in _make_workers(self, env_creator, validate_env, policy_class, config, num_workers) 817 trainer_config=config, 818 num_workers=num_workers, --> 819 logdir=self.logdir) 820 821 @DeveloperAPI ~/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/evaluation/worker_set.py in __init__(self, env_creator, validate_env, policy_class, trainer_config, num_workers, logdir, _setup) 84 remote_spaces = ray.get(self.remote_workers( 85 )[0].foreach_policy.remote( ---> 86 lambda p, pid: (pid, p.observation_space, p.action_space))) 87 spaces = { 88 e[0]: (getattr(e[1], "original_space", e[1]), e[2]) ~/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/_private/client_mode_hook.py in wrapper(*args, **kwargs) 80 if client_mode_should_convert(): 81 return getattr(ray, func.__name__)(*args, **kwargs) ---> 82 return func(*args, **kwargs) 83 84 return wrapper ~/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/worker.py in get(object_refs, timeout) 1564 raise value.as_instanceof_cause() 1565 else: -> 1566 raise value 1567 1568 if is_individual_id: RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=41942, ip=192.168.1.4) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection raise err File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 86, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused During handling of the above exception, another exception occurred: ray::RolloutWorker.__init__() (pid=41942, ip=192.168.1.4) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen chunked=chunked, File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 394, in _make_request conn.request(method, url, **httplib_request_kw) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 234, in request super(HTTPConnection, self).request(method, url, body=body, headers=headers) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1287, in request self._send_request(method, url, body, headers, encode_chunked) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1333, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1282, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1042, in _send_output self.send(msg) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 980, in send self.connect() File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 200, in connect conn = self._new_conn() File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn self, "Failed to establish a new connection: %s" % e urllib3.exceptions.NewConnectionError: : Failed to establish a new connection: [Errno 111] Connection refused During handling of the above exception, another exception occurred: ray::RolloutWorker.__init__() (pid=41942, ip=192.168.1.4) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 450, in send timeout=timeout File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',)) During handling of the above exception, another exception occurred: ray::RolloutWorker.__init__() (pid=41942, ip=192.168.1.4) File "python/ray/_raylet.pyx", line 523, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 530, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 534, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task.function_executor File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/_private/function_manager.py", line 563, in actor_method_executor return method(__ray_actor, *args, **kwargs) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 392, in __init__ self.env = env_creator(env_context) File "", line 43, in File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 184, in make return registry.make(id, **kwargs) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 106, in make env = spec.make(**kwargs) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 76, in make env = cls(**_kwargs) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/httpEnv.py", line 12, in __init__ self.instance_id = self.client.env_create(env_id, config) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 55, in env_create resp = self._post_request(route, data) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 43, in _post_request data=json.dumps(data)) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 577, in post return self.request('POST', url, data=data, json=json, **kwargs) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 529, in request resp = self.send(prep, **send_kwargs) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 645, in send r = adapter.send(request, **kwargs) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 519, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',)) ​
fuxiupresearch commented 7 months ago

We can troubleshoot step by step. After you start the gymHttpServer using "nohup python -u rl4rs/server/gymHttpServer.py &", you can check if the gymHttpServer is working correctly by running "cd rl4rs/server & python gymHttpCilent.py" in another terminal. Thank you!

Larry-Liu02 commented 7 months ago

Many thanks for your support! I check the error. The reason may be the rl4rs file's path causing the trouble.

I started the server, but it stopped at this point from nohup.out: Server starting at: http://0.0.0.0:5000

And then it can not move

fuxiupresearch commented 7 months ago

The error "Address already in use" typically occurs when a port is already in use by another process. You should: lsof -t -i:5000 | xargs -I {} kill -9 {}

Larry-Liu02 commented 7 months ago

Many thanks for your support! I found the main reason caused by python gymHttpClient.py, I can not run this file, always shows this error: requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efe9d2c9828>: Failed to establish a new connection: [Errno 111] Connection refused',))

fuxiupresearch commented 7 months ago

Typically, the error ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded indicates that the gymHttpServer.py program is not running with the specified host and port. Ensure that the intended service is running and listening on the specified port. Additionally, I'd recommend reaching out to someone with experience in web development (or GPT-4) for further assistance.