IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.32k stars 460 forks source link

Carla preset multi-thread TCP error #419

Open jfalfaro opened 4 years ago

jfalfaro commented 4 years ago

Greetings, I downloaded version 1.0.0 and tried running the presets CARLA_DDPG and CARLA_DOUBLE_DDQN in multi-thread mode but I'm getting TCP connection issues between client and server. Here are some details:

Command that I used: coach -p CARLA_DDPG -n 2

Output and stack trace:

/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/juan/Documents/coach_env/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Please enter an experiment name: 

Creating graph - name: BasicRLGraphManager task id: 0 type: ps
WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py:176: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/architectures/tensorflow_components/distributed_tf_utils.py:78: The name tf.train.Server is deprecated. Please use tf.distribute.Server instead.

WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/architectures/tensorflow_components/distributed_tf_utils.py:78: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

Creating graph - name: BasicRLGraphManager task id: 0 type: worker
WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py:176: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/architectures/tensorflow_components/distributed_tf_utils.py:78: The name tf.train.Server is deprecated. Please use tf.distribute.Server instead.

WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/architectures/tensorflow_components/distributed_tf_utils.py:78: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/architectures/tensorflow_components/distributed_tf_utils.py:72: The name tf.train.replica_device_setter is deprecated. Please use tf.compat.v1.train.replica_device_setter instead.

Creating graph - name: BasicRLGraphManager task id: 1 type: worker
WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py:176: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/architectures/tensorflow_components/distributed_tf_utils.py:78: The name tf.train.Server is deprecated. Please use tf.distribute.Server instead.

WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/architectures/tensorflow_components/distributed_tf_utils.py:78: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/architectures/tensorflow_components/distributed_tf_utils.py:72: The name tf.train.replica_device_setter is deprecated. Please use tf.compat.v1.train.replica_device_setter instead.

Creating agent - name: agent task id: 0 (may take up to 30 seconds due to tensorflow wake up time)
Creating agent - name: agent task id: 1 (may take up to 30 seconds due to tensorflow wake up time)
simple_rl_graph: Starting heatup
simple_rl_graph: Starting heatup
2019-10-22-00:59:35.707989 Heatup - Name: main_level/agent Worker: 1 Episode: 1 Total reward: -9872.13 Steps: 999 Training iteration: 0 
2019-10-22-00:59:44.377403 Heatup - Name: main_level/agent Worker: 0 Episode: 1 Total reward: -9034.87 Steps: 999 Training iteration: 0 
2019-10-22-01:00:12.025546 Heatup - Name: main_level/agent Worker: 1 Episode: 2 Total reward: -6171.6 Steps: 1998 Training iteration: 0 
Starting to improve simple_rl_graph task index 1
2019-10-22-01:00:20.547063 Heatup - Name: main_level/agent Worker: 0 Episode: 2 Total reward: -7569.77 Steps: 1998 Training iteration: 0 
Starting to improve simple_rl_graph task index 0
Process Process-4:
Traceback (most recent call last):
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/carla/tcp.py", line 67, in write
    self._socket.sendall(header + message)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/coach.py", line 88, in start_graph
    graph_manager.improve()
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py", line 547, in improve
    self.train_and_act(self.steps_between_evaluation_periods)
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py", line 481, in train_and_act
    self.act(EnvironmentSteps(1))
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py", line 447, in act
    result = self.top_level_manager.step(None)
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/level_manager.py", line 250, in step
    env_response = self.environment.step(action_info.action)
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/environments/environment.py", line 299, in step
    self._take_action(action)
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/rl_coach/environments/carla_environment.py", line 417, in _take_action
    self.game.send_control(self.control)
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/carla/client.py", line 145, in send_control
    self._control_client.write(pb_message.SerializeToString())
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/carla/tcp.py", line 69, in write
    self._reraise_exception_as_tcp_error('failed to write data', exception)
  File "/home/juan/Documents/coach_env/lib/python3.6/site-packages/carla/tcp.py", line 97, in _reraise_exception_as_tcp_error
    raise TCPConnectionError('%s%s: %s' % (self._logprefix, message, exception))
carla.tcp.TCPConnectionError: (localhost:59539) failed to write data: [Errno 32] Broken pipe

Let me know if there is any other information I can provide you with that would help.

varunjammula commented 4 years ago

I have the same issue too. At some point during training, when it tries to connect to CARLA, the port is unavailable!

hildebrandt-carl commented 4 years ago

Same error here. Any solutions yet?

Mohamedsabry109 commented 4 years ago

This Broken pipe error results when the buffer size is too large, your ram will be loaded with images as the simulation steps increases. Try decreasing the buffer size and increase the swap space on your device, this will solve your problem