Open Larry-Liu02 opened 8 months ago
We can troubleshoot step by step. After you start the gymHttpServer using "nohup python -u rl4rs/server/gymHttpServer.py &", you can check if the gymHttpServer is working correctly by running "cd rl4rs/server & python gymHttpCilent.py" in another terminal. Thank you!
Many thanks for your support! I check the error. The reason may be the rl4rs file's path causing the trouble.
I started the server, but it stopped at this point from nohup.out: Server starting at: http://0.0.0.0:5000
And then it can not move
The error "Address already in use" typically occurs when a port is already in use by another process. You should: lsof -t -i:5000 | xargs -I {} kill -9 {}
Many thanks for your support! I found the main reason caused by python gymHttpClient.py, I can not run this file, always shows this error: requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efe9d2c9828>: Failed to establish a new connection: [Errno 111] Connection refused',))
Typically, the error ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded indicates that the gymHttpServer.py program is not running with the specified host and port. Ensure that the intended service is running and listening on the specified port. Additionally, I'd recommend reaching out to someone with experience in web development (or GPT-4) for further assistance.
Dear RL4RS Team,
When I run the nohup python -u rl4rs/server/gymHttpServer.py & bash run_modelfree_rl.sh DQN, it always appears the ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000), the same as I run the tutorial.ipynb's last cell. I don't know the reason. Is that related to my local network? I can connect to the Mainland Internet. I also tried using a cloud server, which shows the same error. I wonder to know how to solve it. I'm sharing the complete error information here. Many thanks!!!
/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via
pip install 'ray[default]'
. Please update your install command. "update your install command.", FutureWarning) 2024-03-11 15:06:36,349 INFO services.py:1247 -- View the Ray dashboard at http://127.0.0.1:8265 2024-03-11 15:06:37,418 INFO trainer.py:706 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution{'epoch': 5, 'maxlen': 64, 'batch_size': 64, 'action_size': 284, 'class_num': 2, 'dense_feature_num': 432, 'category_feature_num': 21, 'category_hash_size': 100000, 'seq_num': 2, 'emb_size': 128, 'is_eval': False, 'hidden_units': 128, 'max_steps': 9, 'action_emb_size': 32, 'sample_file': 'simulator/rl4rs_dataset_a_shuf.csv', 'model_file': 'simulator/finetuned/simulator_a_dien/model', 'iteminfo_file': 'raw_data/item_info.csv', 'remote_base': 'http://127.0.0.1:5000', 'trial_name': 'all', 'support_rllib_mask': True, 'env': 'SlateRecEnv-v0'} rllib_config {'env': 'rllibEnv-v0', 'gamma': 1, 'explore': True, 'exploration_config': {'type': 'SoftQ'}, 'num_gpus': 1, 'num_workers': 2, 'framework': 'tf', 'rollout_fragment_length': 9, 'batch_mode': 'complete_episodes', 'train_batch_size': 576, 'evaluation_interval': 1, 'evaluation_num_episodes': 8192, 'evaluation_config': {'explore': False}, 'log_level': 'INFO', 'use_critic': True, 'use_gae': True, 'lambda': 1.0, 'kl_coeff': 0.2, 'sgd_minibatch_size': 256, 'shuffle_sequences': True, 'num_sgd_iter': 1, 'lr': 0.0001, 'vf_loss_coeff': 0.5, 'clip_param': 0.3, 'vf_clip_param': 500.0, 'kl_target': 0.01}
(pid=41960) 2024-03-11 15:06:38,687 ERROR worker.py:421 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=41960, ip=192.168.1.4) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection (pid=41960) raise err (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 86, in create_connection (pid=41960) sock.connect(sa) (pid=41960) ConnectionRefusedError: [Errno 111] Connection refused (pid=41960) (pid=41960) During handling of the above exception, another exception occurred: (pid=41960) (pid=41960) ray::RolloutWorker.init() (pid=41960, ip=192.168.1.4) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen (pid=41960) chunked=chunked, (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 394, in _make_request (pid=41960) conn.request(method, url, httplib_request_kw) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 234, in request (pid=41960) super(HTTPConnection, self).request(method, url, body=body, headers=headers) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1287, in request (pid=41960) self._send_request(method, url, body, headers, encode_chunked) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1333, in _send_request (pid=41960) self.endheaders(body, encode_chunked=encode_chunked) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1282, in endheaders (pid=41960) self._send_output(message_body, encode_chunked=encode_chunked) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1042, in _send_output (pid=41960) self.send(msg) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 980, in send (pid=41960) self.connect() (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 200, in connect (pid=41960) conn = self._new_conn() (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn (pid=41960) self, "Failed to establish a new connection: %s" % e (pid=41960) urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f21554e6518>: Failed to establish a new connection: [Errno 111] Connection refused (pid=41960) (pid=41960) During handling of the above exception, another exception occurred: (pid=41960) (pid=41960) ray::RolloutWorker.init() (pid=41960, ip=192.168.1.4) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 450, in send (pid=41960) timeout=timeout (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen (pid=41960) method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment (pid=41960) raise MaxRetryError(_pool, url, error or ResponseError(cause)) (pid=41960) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f21554e6518>: Failed to establish a new connection: [Errno 111] Connection refused',)) (pid=41960) (pid=41960) During handling of the above exception, another exception occurred: (pid=41960) (pid=41960) ray::RolloutWorker.init() (pid=41960, ip=192.168.1.4) (pid=41960) File "python/ray/_raylet.pyx", line 523, in ray._raylet.execute_task (pid=41960) File "python/ray/_raylet.pyx", line 530, in ray._raylet.execute_task (pid=41960) File "python/ray/_raylet.pyx", line 534, in ray._raylet.execute_task (pid=41960) File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task.function_executor (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/_private/function_manager.py", line 563, in actor_method_executor (pid=41960) return method(ray_actor, *args, **kwargs) (pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 392, in init (pid=41960) self.env = env_creator(env_context) (pid=41960) File "", line 43, in
(pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 184, in make
(pid=41960) return registry.make(id, kwargs)
(pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 106, in make
(pid=41960) env = spec.make(kwargs)
(pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 76, in make
(pid=41960) env = cls(_kwargs)
(pid=41960) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/httpEnv.py", line 12, in init
(pid=41960) self.instance_id = self.client.env_create(env_id, config)
(pid=41960) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 55, in env_create
(pid=41960) resp = self._post_request(route, data)
(pid=41960) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 43, in _post_request
(pid=41960) data=json.dumps(data))
(pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 577, in post
(pid=41960) return self.request('POST', url, data=data, json=json, kwargs)
(pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 529, in request
(pid=41960) resp = self.send(prep, send_kwargs)
(pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 645, in send
(pid=41960) r = adapter.send(request, kwargs)
(pid=41960) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 519, in send
(pid=41960) raise ConnectionError(e, request=request)
(pid=41960) requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f21554e6518>: Failed to establish a new connection: [Errno 111] Connection refused',))
(pid=41942) 2024-03-11 15:06:38,686 ERROR worker.py:421 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker. init() (pid=41942, ip=192.168.1.4)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection
(pid=41942) raise err
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/connection.py", line 86, in create_connection
(pid=41942) sock.connect(sa)
(pid=41942) ConnectionRefusedError: [Errno 111] Connection refused
(pid=41942)
(pid=41942) During handling of the above exception, another exception occurred:
(pid=41942)
(pid=41942) ray::RolloutWorker.init() (pid=41942, ip=192.168.1.4)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
(pid=41942) chunked=chunked,
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 394, in _make_request
(pid=41942) conn.request(method, url, **httplib_request_kw)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 234, in request
(pid=41942) super(HTTPConnection, self).request(method, url, body=body, headers=headers)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1287, in request
(pid=41942) self._send_request(method, url, body, headers, encode_chunked)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1333, in _send_request
(pid=41942) self.endheaders(body, encode_chunked=encode_chunked)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1282, in endheaders
(pid=41942) self._send_output(message_body, encode_chunked=encode_chunked)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 1042, in _send_output
(pid=41942) self.send(msg)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/http/client.py", line 980, in send
(pid=41942) self.connect()
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 200, in connect
(pid=41942) conn = self._new_conn()
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn
(pid=41942) self, "Failed to establish a new connection: %s" % e
(pid=41942) urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7ef8abef6550>: Failed to establish a new connection: [Errno 111] Connection refused
(pid=41942)
(pid=41942) During handling of the above exception, another exception occurred:
(pid=41942)
(pid=41942) ray::RolloutWorker.init() (pid=41942, ip=192.168.1.4)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 450, in send
(pid=41942) timeout=timeout
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
(pid=41942) method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment
(pid=41942) raise MaxRetryError(_pool, url, error or ResponseError(cause))
(pid=41942) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ef8abef6550>: Failed to establish a new connection: [Errno 111] Connection refused',))
(pid=41942)
(pid=41942) During handling of the above exception, another exception occurred:
(pid=41942)
(pid=41942) ray::RolloutWorker.init__() (pid=41942, ip=192.168.1.4)
(pid=41942) File "python/ray/_raylet.pyx", line 523, in ray._raylet.execute_task
(pid=41942) File "python/ray/_raylet.pyx", line 530, in ray._raylet.execute_task
(pid=41942) File "python/ray/_raylet.pyx", line 534, in ray._raylet.execute_task
(pid=41942) File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task.function_executor
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/_private/function_manager.py", line 563, in actor_method_executor
(pid=41942) return method(__ray_actor, *args, kwargs)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 392, in init
(pid=41942) self.env = env_creator(env_context)
(pid=41942) File "", line 43, in
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 184, in make
(pid=41942) return registry.make(id, kwargs)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 106, in make
(pid=41942) env = spec.make(kwargs)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/gym/envs/registration.py", line 76, in make
(pid=41942) env = cls(_kwargs)
(pid=41942) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/httpEnv.py", line 12, in init
(pid=41942) self.instance_id = self.client.env_create(env_id, config)
(pid=41942) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 55, in env_create
(pid=41942) resp = self._post_request(route, data)
(pid=41942) File "/media/kemove/16T/Jupyter/Electronics/rl4rs/server/gymHttpClient.py", line 43, in _post_request
(pid=41942) data=json.dumps(data))
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 577, in post
(pid=41942) return self.request('POST', url, data=data, json=json, kwargs)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 529, in request
(pid=41942) resp = self.send(prep, send_kwargs)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/sessions.py", line 645, in send
(pid=41942) r = adapter.send(request, kwargs)
(pid=41942) File "/home/kemove/anaconda3/envs/rl4rs/lib/python3.6/site-packages/requests/adapters.py", line 519, in send
(pid=41942) raise ConnectionError(e, request=request)
(pid=41942) requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /v1/envs/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ef8abef6550>: Failed to establish a new connection: [Errno 111] Connection refused',))
RayActorError Traceback (most recent call last)