Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
985 stars 260 forks source link

zmq.error.Again: Resource temporarily unavailable #24

Closed huminpurin closed 6 years ago

huminpurin commented 6 years ago

While runing the examples, I'm getting error like below

INFO:tensorflow:Starting queue runners. WARNING:worker_1:worker_1: started training at step: 120 Exception in thread Thread-4: Traceback (most recent call last): File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/home/humin/btgym/btgym/algorithms/runner.py", line 75, in run self._run() File "/home/humin/btgym/btgym/algorithms/runner.py", line 96, in _run self.queue.put(next(rollout_provider), timeout=600.0) File "/home/humin/btgym/btgym/algorithms/runner.py", line 238, in env_runner episode_stat = env.get_stat() # get episode statistic File "/home/humin/btgym/btgym/envs/backtrader.py", line 680, in get_stat if self._force_control_mode(): File "/home/humin/btgym/btgym/envs/backtrader.py", line 508, in _force_control_mode self.server_response = self.socket.recv_pyobj() File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj msg = self.recv(flags) File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc zmq.error.Again: Resource temporarily unavailable I tried to reduce the number of workers but it looks irrelevant. This error is not interrupting the training process. Is this normal or should I be concerned about it? My environment: Ubuntu 16.04. Python 3.5

Kismuz commented 6 years ago

@huminpurin, no it's not normal at all :) which example gives you this error? Is it regular or time-to-time?

huminpurin commented 6 years ago

@Kismuz First I tried example "async_btgym_workers". after I changed num_workers into 4 (I think this number is irrelevant with the error. I mentioned this because this is the only part i changed in example) and launched worker as the example worker.daemon = False worker.start() workers.append(worker) Then I got

Env.step: server unreachable with status: . Env.step: server unreachable with status: . Env.step: server unreachable with status: . Env.step: server unreachable with status: . Process BTgymServer-2: Traceback (most recent call last): File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "../btgym/server.py", line 334, in run service_input = socket.recv_pyobj() File "/home/humin/anaconda3/lib/python3.6/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj msg = self.recv(flags) File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv (zmq/backend/cython/socket.c:7683) File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv (zmq/backend/cython/socket.c:7460) File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy (zmq/backend/cython/socket.c:2437) File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy (zmq/backend/cython/socket.c:2344) File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/socket.c:9823) zmq.error.Again: Resource temporarily unavailable Process BTgymServer-3:1: Traceback (most recent call last): File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "../btgym/server.py", line 433, in run raise RuntimeError('Failed to assert Dataset is ready. Exiting.') RuntimeError: Failed to assert Dataset is ready. Exiting. Process Worker-3: Traceback (most recent call last): File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "", line 17, in run obs = self.env.reset() File "/home/humin/anaconda3/lib/python3.6/site-packages/gym/core.py", line 104, in reset return self._reset() File "../btgym/envs/backtrader.py", line 561, in _reset self.env_response = self._step(0) File "../btgym/envs/backtrader.py", line 658, in _step raise ConnectionError(msg) ConnectionError: Env.step: server unreachable with status: .

...worker_0 has joined.

BtgymServer: data_server unreachable with status: . Process BTgymServer-6:1: Traceback (most recent call last): File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "../btgym/server.py", line 414, in run raise ConnectionError(msg) ConnectionError: BtgymServer: data_server unreachable with status: . BtgymServer: data_server unreachable with status: . Process BTgymServer-4:1: Traceback (most recent call last): File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "../btgym/server.py", line 414, in run raise ConnectionError(msg) ConnectionError: BtgymServer: data_server unreachable with status: . BtgymServer: data_server unreachable with status: . Process BTgymServer-5:1: Traceback (most recent call last): File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "../btgym/server.py", line 414, in run raise ConnectionError(msg) ConnectionError: BtgymServer: data_server unreachable with status: .

...worker_1 has joined. ...worker_2 has joined. ...worker_3 has joined. data_master: environment closed.`

In annother example "a3c_random_on_synth_or_real_data_4_6", I got some similar error after launcher.run() as below

WARNING:worker_1:AAC_1: learn_rate: 0.000100, entropy_beta: 0.010317 Process BTgymServer-2:2: Traceback (most recent call last): File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap self.run() File "/home/humin/btgym/btgym/server.py", line 334, in run service_input = socket.recv_pyobj() File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj msg = self.recv(flags) File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc zmq.error.Again: Resource temporarily unavailable Process BTgymServer-3:1: Traceback (most recent call last): File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap self.run() File "/home/humin/btgym/btgym/server.py", line 334, in run service_input = socket.recv_pyobj() File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj msg = self.recv(flags) File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc zmq.error.Again: Resource temporarily unavailable Process Worker-3: Traceback (most recent call last): File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 468, in make_tensor_proto str_values = [compat.as_bytes(x) for x in proto_values] File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 468, in str_values = [compat.as_bytes(x) for x in proto_values] File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/util/compat.py", line 65, in as_bytes (bytes_or_text,)) TypeError: Expected binary or unicode string, got {'sample_num': <tf.Tensor 'local/on_policy_state_in_metadata_sample_num_pl:0' shape=(?,) dtype=float32>, 'first_row': <tf.Tensor 'local/on_policy_state_in_metadata_first_row_pl:0' shape=(?,) dtype=float32>, 'trial_num': <tf.Tensor 'local/on_policy_state_in_metadata_trial_num_pl:0' shape=(?,) dtype=float32>, 'type': <tf.Tensor 'local/on_policy_state_in_metadata_type_pl:0' shape=(?,) dtype=float32>}

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap self.run() File "/home/humin/btgym/btgym/algorithms/worker.py", line 189, in run self.trainer_kwargs, File "/home/humin/btgym/btgym/algorithms/aac.py", line 972, in init kwargs File "/home/humin/btgym/btgym/algorithms/aac.py", line 423, in init self.inc_step = self.global_step.assign_add(tf.shape(pi.on_state_in[list(pi.on_state_in.keys())[0]])[0]) File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 271, in shape return shape_internal(input, name, optimize=True, out_type=out_type) File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 295, in shape_internal input_tensor = ops.convert_to_tensor(input) File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 836, in convert_to_tensor as_ref=False) File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant value, dtype=dtype, shape=shape, verify_shape=verify_shape)) File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 472, in make_tensor_proto "supported type." % (type(values), values)) TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'sample_num': <tf.Tensor 'local/on_policy_state_in_metadata_sample_num_pl:0' shape=(?,) dtype=float32>, 'first_row': <tf.Tensor 'local/on_policy_state_in_metadata_first_row_pl:0' shape=(?,) dtype=float32>, 'trial_num': <tf.Tensor 'local/on_policy_state_in_metadata_trial_num_pl:0' shape=(?,) dtype=float32>, 'type': <tf.Tensor 'local/on_policy_state_in_metadata_type_pl:0' shape=(?,) dtype=float32>}. Consider casting elements to a supported type.

Kismuz commented 6 years ago

@huminpurin, you have spotted my mistake! Please update btgym, it's corrected now, here:

data_master = BTgymEnv(
    dataset=MyDataset,  # It is the only environment here for which dataset is required:
    port=5050,
    data_port=data_port,
    data_master=True,
    connect_timeout=10,  # set server connection timeout to 10 second (default is 60).
    verbose=0,
)

o = data_master.reset() # <=== CORRECTED HERE: fake reset() tells data_master to start data_server_process

 # Make and launch workers in separate processes:
for i in range(num_workers):
    # Worker environment configuration:
    env_config=dict(
huminpurin commented 6 years ago

@Kismuz Thanks! It works great. By the way there came another error

AttributeError: 'FigureCanvasGTKAgg' object has no attribute 'renderer'

As far as i searched it's concerned with matplotlib, though i have installed version 2.0.2 as reccomended. I uncommented fig.canvas.draw() line in plotting.py file under rending folder. After that things work perfect!

Kismuz commented 6 years ago

thanks, updated.