Closed knn940506 closed 6 years ago
@knn940506, empty queue usually means that thread runner process either dint started or quietly died. As some updates has been made since your fork @8.01.18, I recommend to update btgym package first. If error persists, please provide some details:
I updated btgym by using and run again but still error occurs.
cd btgym git pull pip install --upgrade -e .
error occurs pattern = many reset warnings -> global_step info -> error
[2018-01-12 02:01:25.982257] WARNING: BTgymServer_0: _reset
Thanks so much !
Tested other examples, looks like my workers lose Backtrader Server connection.
if program runs longer, below message always appears
~/바탕화면/git/btgym/btgym/envs/backtrader.py in _step(self, action) 748 msg = '.step(): server unreachable with status: <{}>.'.format(env_response['status']) 749 self.log.error(msg) --> 750 raise ConnectionError(msg) 751 752 self.env_response = env_response ['message']
ConnectionError: .step(): server unreachable with status:
@knn940506, well, it is different error. Do the following:
at line 47 of notebook set: connect_timeout=120,
Pay attention to how you interrupt/restart notebook kernel: ( taken from #17 ): Every BTGYM instance launches at least two separate processes, not counting jupyter kernel itself:
lsof -i:5000
lsof -i:4999
...and do manual kill.
Note, that when running A3C examples there are also 12230 and 12231 to watch for.
Usually it throws errors like:
Decrease number of workers to 6. Still gives full load to CPU, can eliminate inter-threads concurrence slowdowns.
If nothing helps set Launcher kwarg verbose=3
and paste ~50 last lines of log output.
Error not changes... Here are some Terminal log
2018-01-15 11:19:33.222086: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-01-15 11:19:33.224407: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA E0115 11:19:33.225525382 2448 ev_epoll1_linux.c:1051] grpc epoll fd: 52 E0115 11:19:33.225550836 2439 ev_epoll1_linux.c:1051] grpc epoll fd: 51 2018-01-15 11:19:33.230664: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12230} 2018-01-15 11:19:33.230663: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> localhost:12230} 2018-01-15 11:19:33.230714: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12231, 1 -> 127.0.0.1:12232, 2 -> 127.0.0.1:12233, 3 -> 127.0.0.1:12234} 2018-01-15 11:19:33.230717: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:12231, 1 -> 127.0.0.1:12232, 2 -> 127.0.0.1:12233, 3 -> 127.0.0.1:12234} 2018-01-15 11:19:33.231020: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:12231 2018-01-15 11:19:33.231497: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:12230 2018-01-15 11:19:37.685478: I tensorflow/core/distributed_runtime/master_session.cc:1004] Start master session b6839cbeeb119750 with config: intra_op_parallelism_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0" inter_op_parallelism_threads: 2 2018-01-15 11:19:38.231957: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA E0115 11:19:38.232226783 2503 ev_epoll1_linux.c:1051] grpc epoll fd: 53 2018-01-15 11:19:38.236208: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-01-15 11:19:38.236407: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12230} 2018-01-15 11:19:38.236446: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:12231, 1 -> localhost:12232, 2 -> 127.0.0.1:12233, 3 -> 127.0.0.1:12234} E0115 11:19:38.236568040 2507 ev_epoll1_linux.c:1051] grpc epoll fd: 54 2018-01-15 11:19:38.236800: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:12232 2018-01-15 11:19:38.240948: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12230} 2018-01-15 11:19:38.240997: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:12231, 1 -> 127.0.0.1:12232, 2 -> localhost:12233, 3 -> 127.0.0.1:12234} 2018-01-15 11:19:38.241403: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:12233 2018-01-15 11:19:38.242178: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA E0115 11:19:38.242392991 2516 ev_epoll1_linux.c:1051] grpc epoll fd: 55 2018-01-15 11:19:38.247020: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12230} 2018-01-15 11:19:38.247056: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:12231, 1 -> 127.0.0.1:12232, 2 -> 127.0.0.1:12233, 3 -> localhost:12234} 2018-01-15 11:19:38.247372: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:12234 2018-01-15 11:19:41.789174: I tensorflow/core/distributed_runtime/master_session.cc:1004] Start master session 1e5bfb978931a13a with config: intra_op_parallelism_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:3/cpu:0" inter_op_parallelism_threads: 2 2018-01-15 11:19:41.807742: I tensorflow/core/distributed_runtime/master_session.cc:1004] Start master session 94c6cd7bd0b0fa12 with config: intra_op_parallelism_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:1/cpu:0" inter_op_parallelism_threads: 2 2018-01-15 11:19:42.002243: I tensorflow/core/distributed_runtime/master_session.cc:1004] Start master session 23214af6a52fc7cf with config: intra_op_parallelism_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:2/cpu:0" inter_op_parallelism_threads: 2
There is one thing weird, I set num_worker=4 but it looks like Worker-5 is working
INFO:tensorflow:Saving checkpoint to path /home/joowonkim/tmp/test_gym_a3c/train/model.ckpt INFO:tensorflow:global/global_step/sec: 100.832 INFO:tensorflow:Error reported to Coordinator: <class 'queue.Empty'>,
Process Worker-5:
Traceback (most recent call last):
File "/home/joowonkim/anaconda3/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/worker.py", line 241, in run
trainer.process(sess)
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 747, in process
data = self.get_data()
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 594, in get_data
data_streams = [get_it() for get_it in self.data_getter]
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 594, in
Is it natural?
@knn940506, terminal log you provided is ok, no errors there, refer to #23 for details;
No it not natural; I see that sub-processes error reporting should be somehow improved. I'll take time to see how it should be fixed.
@knn940506, I have updated error reporting for child processes. It does not solve error but can give a hint what's going wrong. Please update package, run example and post traceback here.
@knn940506 - forgot to remove exception test case, sorry for that. Corrected, update.
Traceback (most recent call last):
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 90, in run
self._run()
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 117, in _run
self.queue.put(next(rollout_provider), timeout=600.0)
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 222, in env_runner
state, reward, terminal, info = env.step(action.argmax())
File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/gym/core.py", line 96, in step
return self._step(action)
File "/home/joowonkim/바탕화면/git/btgym/btgym/envs/backtrader.py", line 750, in _step
raise ConnectionError(msg)
ConnectionError: .step(): server unreachable with status:
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 90, in run
self._run()
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 117, in _run
self.queue.put(next(rollout_provider), timeout=600.0)
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 222, in env_runner
state, reward, terminal, info = env.step(action.argmax())
File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/gym/core.py", line 96, in step
return self._step(action)
File "/home/joowonkim/바탕화면/git/btgym/btgym/envs/backtrader.py", line 750, in _step
raise ConnectionError(msg)
ConnectionError: .step(): server unreachable with status:
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/joowonkim/anaconda3/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 95, in run raise RuntimeError RuntimeError
INFO:tensorflow:global/global_step/sec: 40.9994 INFO:tensorflow:global/global_step/sec: 0 INFO:tensorflow:Saving checkpoint to path /home/joowonkim/tmp/test_gym_a3c/train/model.ckpt INFO:tensorflow:global/global_step/sec: 0 INFO:tensorflow:global/global_step/sec: 0 INFO:tensorflow:Saving checkpoint to path /home/joowonkim/tmp/test_gym_a3c/train/model.ckpt INFO:tensorflow:global/global_step/sec: 0 [2018-01-17 04:18:22.827845] ERROR: A3C_1: process() exception occurred
Press Ctrl-C
or jupyter:[Kernel]->[Interrupt] for clean exit.
Traceback (most recent call last):
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 1076, in process
data = self._get_data()
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 634, in _get_data
data_streams = [get_it() for get_it in self.data_getter]
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 634, in
Press Ctrl-C
or jupyter:[Kernel]->[Interrupt] for clean exit.
Process Worker-17:
Traceback (most recent call last):
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 1076, in process
data = self._get_data()
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 634, in _get_data
data_streams = [get_it() for get_it in self.data_getter]
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 634, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/joowonkim/anaconda3/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/worker.py", line 257, in run sv.stop() File "/home/joowonkim/anaconda3/lib/python3.5/contextlib.py", line 77, in exit self.gen.throw(type, value, traceback) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session self.stop(close_summary_writer=close_summary_writer) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop stop_grace_period_secs=self._stop_grace_secs) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/six.py", line 693, in reraise raise value File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 954, in managed_session yield sess File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/worker.py", line 257, in run sv.stop() File "/home/joowonkim/anaconda3/lib/python3.5/contextlib.py", line 77, in exit self.gen.throw(type, value, traceback) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 4339, in get_controller yield default File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/worker.py", line 250, in run trainer.process(sess) File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 1145, in process raise RuntimeError(msg) RuntimeError: process() exception occurred
Press Ctrl-C
or jupyter:[Kernel]->[Interrupt] for clean exit.
[2018-01-17 04:18:32.567306] ERROR: A3C_2: process() exception occurred
2018-01-17 04:18:53.225776] ERROR: A3C_0: process() exception occurred
Press Ctrl-C
or jupyter:[Kernel]->[Interrupt] for clean exit.
Traceback (most recent call last):
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 1076, in process
data = self._get_data()
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 634, in _get_data
data_streams = [get_it() for get_it in self.data_getter]
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 634, in
Press Ctrl-C
or jupyter:[Kernel]->[Interrupt] for clean exit.
similar errors keep occur. Do you need moe logs?? I set env.verbos=1 and num_workers=4 Thanks !!
Ok, base exception occured here:
File "/home/joowonkim/바탕화면/git/btgym/btgym/envs/backtrader.py", line 750, in _step raise ConnectionError(msg) ConnectionError: .step(): server unreachable with status:
.
... for some reasons BTGym server did not responded to API shell in proper time; everything else are consecutive errors. This is rather strange but we can track it:
a3c_random_on_synth_or_real_data...
:
env_config = dict(
...
kwargs=dict(
....
connect_timeout=180,
verbose=2,
)
)
....
cluster_config = dict(
...
num_workers=1,
num_ps=1,
num_envs=1,
....
)
.....
launcher = Launcher(
...
verbose=2,
)
and paste log output until error mentioned above.
works well at step 1.
At Jupyter Notebook
[2018-01-18 08:22:00.041878] DEBUG: BTgymServer_0: Episode countdown started at: 1393, END OF DATA, r:-0.2578244975861855 [2018-01-18 08:22:00.044134] DEBUG: BTgymServer_0: Episode countdown contd. at: 1394, CLOSE, END OF DATA, r:-0.2578244975861855 [2018-01-18 08:22:00.045461] DEBUG: BTgymServer_0: Episode countdown contd. at: 1395, CLOSE, END OF DATA, r:-0.2578244975861855 [2018-01-18 08:22:00.046319] DEBUG: BTgymServer_0: COMM recieved: {'action': 'hold'} [2018-01-18 08:22:00.046877] DEBUG: BTgymServer_0: RunStop() invoked with CLOSE, END OF DATA [2018-01-18 08:22:00.975725] DEBUG: BTgymServer_0: Episode elapsed time: 0:00:01.763553. [2018-01-18 08:23:00.106587] ERROR: ThreadRunner_0: RunTime exception occurred.
Press Ctrl-C
or jupyter:[Kernel]->[Interrupt] for clean exit.
Traceback (most recent call last): File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 90, in run self._run() File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 117, in _run self.queue.put(next(rollout_provider), timeout=600.0) File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 263, in env_runner episode_stat = env.get_stat() # get episode statistic File "/home/joowonkim/바탕화면/git/btgym/btgym/envs/backtrader.py", line 772, in get_stat if self._force_control_mode(): File "/home/joowonkim/바탕화면/git/btgym/btgym/envs/backtrader.py", line 545, in _force_control_mode self.server_response = self.socket.recv_pyobj() File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj msg = self.recv(flags) File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc zmq.error.Again: Resource temporarily unavailable
Exception in thread Thread-4: Traceback (most recent call last): File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 90, in run self._run() File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 117, in _run self.queue.put(next(rollout_provider), timeout=600.0) File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 263, in env_runner episode_stat = env.get_stat() # get episode statistic File "/home/joowonkim/바탕화면/git/btgym/btgym/envs/backtrader.py", line 772, in get_stat if self._force_control_mode(): File "/home/joowonkim/바탕화면/git/btgym/btgym/envs/backtrader.py", line 545, in _force_control_mode self.server_response = self.socket.recv_pyobj() File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj msg = self.recv(flags) File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc zmq.error.Again: Resource temporarily unavailable
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/joowonkim/anaconda3/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/runner.py", line 95, in run raise RuntimeError RuntimeError
INFO:tensorflow:global/global_step/sec: 5.66658 INFO:tensorflow:global/global_step/sec: 0 INFO:tensorflow:Saving checkpoint to path /home/joowonkim/tmp/test_gym_a3c/train/model.ckpt INFO:tensorflow:global/global_step/sec: 0 INFO:tensorflow:global/global_step/sec: 0 INFO:tensorflow:Saving checkpoint to path /home/joowonkim/tmp/test_gym_a3c/train/model.ckpt INFO:tensorflow:global/global_step/sec: 0 [2018-01-18 08:31:59.980364] ERROR: A3C_0: process() exception occurred
Press Ctrl-C
or jupyter:[Kernel]->[Interrupt] for clean exit.
Traceback (most recent call last):
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 1076, in process
data = self._get_data()
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 634, in _get_data
data_streams = [get_it() for get_it in self.data_getter]
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 634, in
At terminal
E0118 17:21:45.308309060 19328 ev_epoll1_linux.c:1051] grpc epoll fd: 52 2018-01-18 17:21:45.312629: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> localhost:12230} 2018-01-18 17:21:45.312629: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12230} 2018-01-18 17:21:45.312664: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12231} 2018-01-18 17:21:45.312664: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:12231} 2018-01-18 17:21:45.312991: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:12231 2018-01-18 17:21:45.313294: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:12230 2018-01-18 17:21:49.566307: I tensorflow/core/distributed_runtime/master_session.cc:1004] Start master session ad2b7177ea7201bf with config: intra_op_parallelism_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0" inter_op_parallelism_threads: 2
Thanks for your works :+1: :+1:
@knn940506, I have corrected some unsafe code which can potentially lead to such exception. Problem is I can't verify it locally as there is no such error appeared with my work setup (MACOS).
Update btgym and run again. If error remains, create notebook in /examples directory an run following code in it:
import os
import backtrader as bt
from btgym import BTgymEnv, BTgymDataset
from btgym.strategy.observers import Reward, Position, NormPnL
from btgym.research import DevStrat_4_6
MyCerebro = bt.Cerebro()
MyCerebro.addstrategy(
DevStrat_4_6,
drawdown_call=5, # max % to loose, in percent of initial cash
target_call=10, # max % to win, same
skip_frame=10,
)
# Set leveraged account:
MyCerebro.broker.setcash(2000)
MyCerebro.broker.setcommission(commission=0.0001, leverage=10.0) # commisssion to imitate spread
MyCerebro.addsizer(bt.sizers.SizerFix, stake=5000,)
# Visualisations for reward, position and PnL dynamics:
MyCerebro.addobserver(Reward)
MyCerebro.addobserver(Position)
MyCerebro.addobserver(NormPnL)
MyDataset = BTgymDataset(
#filename='./data/DAT_ASCII_EURUSD_M1_201703.csv',
#filename='./data/DAT_ASCII_EURUSD_M1_201704.csv',
filename='./data/test_sine_1min_period256_delta0002.csv',
start_weekdays={0, 1, 2, 3},
episode_duration={'days': 0, 'hours': 23, 'minutes': 55},
start_00=False,
time_gap={'hours': 6},
)
env_config = dict(
class_ref=BTgymEnv,
kwargs=dict(
dataset=MyDataset,
engine=MyCerebro,
render_modes=['episode', 'human','external'],
render_state_as_image=True,
render_ylabel='OHL_diff.',
render_size_episode=(12,8),
render_size_human=(9, 4),
render_size_state=(11, 3),
render_dpi=75,
port=5000,
data_port=4999,
verbose=1,
)
)
# Make environment:
env = env_config['class_ref'](**env_config['kwargs'])
# Run several episodes with statistic fetches:
for episode in range(4):
o = env.reset()
done = False
while not done:
obs, reward, done, info = env.step(env.action_space.sample())
episode_stat = env.get_stat()
for k, v in episode_stat.items():
print('{}: {}'.format(k, v))
env.close()
Is any exception raised? If yes, provide feedback pls.
updated btgym but aac.py has error!
Traceback (most recent call last):
File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 468, in make_tensor_proto
str_values = [compat.as_bytes(x) for x in proto_values]
File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 468, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 492, in init self.inc_step = self.global_step.assign_add(tf.shape(pi.on_state_in[list(pi.on_state_in.keys())[0]])[0]) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 271, in shape return shape_internal(input, name, optimize=True, out_type=out_type) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 295, in shape_internal input_tensor = ops.convert_to_tensor(input) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 836, in convert_to_tensor as_ref=False) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant value, dtype=dtype, shape=shape, verify_shape=verify_shape)) File "/home/joowonkim/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 472, in make_tensor_proto "supported type." % (type(values), values)) TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'trial_num': <tf.Tensor 'local/on_policy_state_in_metadata_trial_num_pl:0' shape=(?,) dtype=float32>, 'type': <tf.Tensor 'local/on_policy_state_in_metadata_type_pl:0' shape=(?,) dtype=float32>, 'first_row': <tf.Tensor 'local/on_policy_state_in_metadata_first_row_pl:0' shape=(?,) dtype=float32>, 'sample_num': <tf.Tensor 'local/on_policy_state_in_metadata_sample_num_pl:0' shape=(?,) dtype=float32>}. Consider casting elements to a supported type.
no exception raised in your new example code. here's the result.
[2018-01-19 01:55:44.229338] INFO: BTgymAPIshell_0: ...done.
[2018-01-19 01:55:44.230378] INFO: BTgymAPIshell_0: Custom Cerebro class used.
[2018-01-19 01:55:44.318731] INFO: BTgymServer_0: PID: 28047
[2018-01-19 01:55:45.318373] INFO: BTgymAPIshell_0: Server started, pinging tcp://127.0.0.1:5000 ...
[2018-01-19 01:55:45.321071] INFO: BTgymAPIshell_0: Server seems ready with response: <{'ctrl': 'send control keys: <_reset>, <_getstat>, <_render>, <_stop>.'}>
[2018-01-19 01:55:45.322550] INFO: BTgymAPIshell_0: Environment is ready.
[2018-01-19 01:55:45.327601] INFO: BTgymAPIshell_0: Data domain reset()
called prior to reset_data()
with [possibly inconsistent] defaults.
[2018-01-19 01:55:45.332980] INFO: SimpleDataSet_0: New sample id: <train_trial_w_0_num_0_at_2017-01-03 12:47:00>.
[2018-01-19 01:55:45.337404] INFO: SimpleDataSet_0: New sample id: <train_trial_w_0_num_1_at_2017-01-05 02:48:00>.
[2018-01-19 01:55:45.357896] INFO: Trial_0: New sample id: <train_episode_w_0_num_0_at_2017-01-03 12:47:00>.
[2018-01-19 01:55:47.013175] INFO: SimpleDataSet_0: New sample id: <train_trial_w_0_num_2_at_2017-01-03 09:38:00>.
[2018-01-19 01:55:47.025657] INFO: Trial_0: New sample id: <train_episode_w_0_num_0_at_2017-01-05 02:48:00>.
episode: 0
length: 1380
runtime: 0:00:01.593744
[2018-01-19 01:55:48.638609] INFO: SimpleDataSet_0: New sample id: <train_trial_w_0_num_3_at_2017-01-03 21:30:00>.
[2018-01-19 01:55:48.653948] INFO: Trial_0: New sample id: <train_episode_w_0_num_0_at_2017-01-03 09:38:00>.
episode: 1
length: 1424
runtime: 0:00:01.553601
[2018-01-19 01:55:50.253536] INFO: SimpleDataSet_0: New sample id: <train_trial_w_0_num_4_at_2017-01-04 11:51:00>.
[2018-01-19 01:55:50.264350] INFO: Trial_0: New sample id: <train_episode_w_0_num_0_at_2017-01-03 21:30:00>.
episode: 2
length: 1424
runtime: 0:00:01.539417
[2018-01-19 01:55:51.793564] INFO: BTgymServer_0: Exiting.
episode: 3
length: 1424
runtime: 0:00:01.394918
[2018-01-19 01:55:51.795087] INFO: BTgymAPIshell_0: Exiting. Exit code: None
[2018-01-19 01:55:51.796303] INFO: BTgymDataServer_0: {'ctrl': 'Exiting.'}
[2018-01-19 01:55:51.797510] INFO: BTgymAPIshell_0: {'ctrl': 'Exiting.'} Exit code: None
[2018-01-19 01:55:51.798299] INFO: BTgymAPIshell_0: Environment closed.
That one was tricky but good it popped out. Corrected, please update and try again. I also installed Python 3.5 (as yours, maybe error is version dependant) and have run tests, but still it works on my machine.
Sadly, It doesn't work. Maybe error comes from other things. I'll give you feedback soon Thanks a lot :)
@knn940506 , I have recently implemented another type of runner that doesn't relies on queue;
it can be found at btgym.algorithms.runner.synchro.BaseSynchroRunner
usage can be found at MLDG implementation: https://github.com/Kismuz/btgym/tree/develop_meta_learning_gradient
Thanks for great work :)
I have an issue while running examples - a3c_random_on_synth_or_real_data ...
I got several <INFO:tensorflow:Error reported to Coordinator: <class 'queue.Empty'>> messages and then stopped.
Is there anyway I can fix it ?? Thank you so much. Kim.
[2018-01-11 20:50:20,439] Error reported to Coordinator: <class 'queue.Empty'>, Process Worker-6: Traceback (most recent call last): File "/home/joowonkim/anaconda3/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/worker.py", line 241, in run trainer.process(sess) File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 747, in process data = self.get_data() File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 594, in get_data data_streams = [get_it() for get_it in self.data_getter] File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/aac.py", line 594, in
data_streams = [get_it() for get_it in self.data_getter]
File "/home/joowonkim/바탕화면/git/btgym/btgym/algorithms/rollout.py", line 33, in pull_rollout_from_queue
return queue.get(timeout=600.0)
File "/home/joowonkim/anaconda3/lib/python3.5/queue.py", line 172, in get
raise Empty
queue.Empty
INFO:tensorflow:global/global_step/sec: 0
[2018-01-11 20:51:38,860] global/global_step/sec: 0
INFO:tensorflow:Error reported to Coordinator: <class 'queue.Empty'>,
[2018-01-11 20:51:48,678] Error reported to Coordinator: <class 'queue.Empty'>,
and stopped