Closed JaCoderX closed 4 years ago
@JacobHanouna , I actually neither used nor tested any of backtrader filters with btgym;
error you encounter means environment has not been properly reset after strategy run has been finished which itself is usually caused by incorrect done
flag checking (i.e. btgym server does not get done
flag from strategy _analyzer upon episode termination);
One output-heavy but effective way to isolate such problem is : a/ making very short episode and compact observation space state b/ manually running it (maybe in handmade loop) with verbose=2 (debug) and examining full dump to see if server-environmnet talk correctly terminates episode;
@JacobHanouna, to be more specific:
episode termination status is handled by self._get_done()
method, which itself relies, among others, on self.data.numrecords
attribute. Make sure its value matches resampled backtest length.
I've had no success so far at identifying the problem.
I've turned on verbose=2 and try different directions:
self.data.numrecords
shows correct values (it length is unaffected by filtering and all datas are of the same size). if there is any problem with _get_done()
maybe a different condition is needed? but I'm not convinced that the underlining problem is even related to the filtering operationValueError: too many values to unpack (expected 4)
is just an array of this string No <ctrl> key received:Control mode: received <{'action': {'base': 'hold'}}>
from the first error.hints
from the errors I get are as follow: Hint: Wrong Strategy.get_state() parameters?
and Hint: forgot to call reset()?
I'll post all the errors I see maybe some clues can be found from there
Error 1 - most common error:
2019-02-01 14:40:28.121738: I tensorflow/core/distributed_runtime/master_session.cc:1161] Start master session 077332762949eefb with config: intra_op_parallelism_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0" inter_op_parallelism_threads: 2
[2019-02-01 12:40:29.160463] NOTICE: Worker_0: initializing all parameters...
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[2019-02-01 12:40:30.764469] NOTICE: Worker_0: no saved model parameters found in:
/home/jack/tmp/pre_trained_model4
INFO:tensorflow:Restoring parameters from /home/jack/tmp/test_4_11_multi8/current_train_checkpoint/model_parameters-876573
[2019-02-01 12:40:35.900909] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-02-01 12:40:36.074832] NOTICE: Worker_0: started training at step: 876573
Exception in thread Thread-11:
[2019-02-01 12:41:21.985463] ERROR: BTgymMultiDataShell_0: Unexpected environment response: No <ctrl> key received:Control mode: received <{'action': {'base': 'hold'}}>
Hint: forgot to call reset()?
Traceback (most recent call last):
Hint: Forgot to call reset() or reset_data()?
File "/home/jack/btgym/btgym/envs/base.py", line 612, in _assert_response
Traceback (most recent call last):
assert type(response) == tuple and len(response) == 4
AssertionError
Error 2 - second common error:
2019-01-31 18:45:01.905468: I tensorflow/core/distributed_runtime/master_session.cc:1161] Start master session 5c5e9b06ec0d8ef8 with config: intra_op_parallelism_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0" inter_op_parallelism_threads: 2
[2019-01-31 16:45:02.987891] NOTICE: Worker_0: initializing all parameters...
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[2019-01-31 16:45:04.735516] NOTICE: Worker_0: no saved model parameters found in:
/home/jack/tmp/pre_trained_model4
[2019-01-31 16:45:04.896729] NOTICE: Worker_0: no saved model parameters found in:
/home/jack/tmp/test_4_11_multi8/current_train_checkpoint
[2019-01-31 16:45:04.896830] NOTICE: Worker_0: training from scratch...
[2019-01-31 16:45:04.896879] INFO: Worker_0: connecting to the parameter server...
[2019-01-31 16:45:08.085864] DEBUG: PPO_0: c_1, c_train=1, c_test=0, type=0
[2019-01-31 16:45:08.086511] DEBUG: BTgymDataServer_0: Received <{'ctrl': '_get_info'}>
[2019-01-31 16:45:08.086587] DEBUG: BTgymDataServer_0: Sending info for #0.
[2019-01-31 16:45:08.087875] INFO: BTgymMultiDataShell_0: Data domain `reset()` called prior to `reset_data()` with [possibly inconsistent] defaults.
[2019-01-31 16:45:08.088121] DEBUG: BTgymServer_0: Control mode: received <{'ctrl': '_done'}>
[2019-01-31 16:45:08.088191] DEBUG: BTgymServer_0: Control mode: sent: {'ctrl': 'send control keys: <_reset>, <_getstat>, <_render>, <_stop>.'}
[2019-01-31 16:45:08.088371] DEBUG: BTgymMultiDataShell_0: FORCE CONTROL MODE attempt: 1.
Response: {'ctrl': 'send control keys: <_reset>, <_getstat>, <_render>, <_stop>.'}
[2019-01-31 16:45:08.088533] DEBUG: BTgymDataServer_0: Received <{'ctrl': '_reset_data', 'kwargs': {}}>
[2019-01-31 16:45:08.088964] DEBUG: multi_data_resample_1_0: data has been already loaded. Use `force_reload=True` to reload
[2019-01-31 16:45:08.089064] DEBUG: multi_data_resample_1_0: time stamps start: 1483228800.0, current: 1483228800.0 final: 1483828740.0
[2019-01-31 16:45:08.089372] DEBUG: multi_data_resample_2_0: data has been already loaded. Use `force_reload=True` to reload
[2019-01-31 16:45:08.089470] DEBUG: multi_data_resample_2_0: time stamps start: 1483228800.0, current: 1483228800.0 final: 1483828740.0
[2019-01-31 16:45:08.089767] DEBUG: multi_data_base_0: data has been already loaded. Use `force_reload=True` to reload
[2019-01-31 16:45:08.089845] DEBUG: multi_data_base_0: time stamps start: 1483228800.0, current: 1483228800.0 final: 1483828740.0
[2019-01-31 16:45:08.090118] DEBUG: multi_data_resample_3_0: data has been already loaded. Use `force_reload=True` to reload
[2019-01-31 16:45:08.090196] DEBUG: multi_data_resample_3_0: time stamps start: 1483228800.0, current: 1483228800.0 final: 1483828740.0
[2019-01-31 16:45:08.091304] INFO: multi_data_0: shared num. records: 10000
[2019-01-31 16:45:08.093564] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-01-31 16:45:08.093635] DEBUG: BTgymDataServer_0: Data_is_ready: True
[2019-01-31 16:45:08.093889] DEBUG: BTgymMultiDataShell_0: Dataset seems ready with response: <{'ctrl': 'Reset with kwargs: {}'}>
[2019-01-31 16:45:08.094219] DEBUG: BTgymServer_0: Control mode: received <{'ctrl': '_done'}>
[2019-01-31 16:45:08.094292] DEBUG: BTgymServer_0: Control mode: sent: {'ctrl': 'send control keys: <_reset>, <_getstat>, <_render>, <_stop>.'}
[2019-01-31 16:45:08.094530] DEBUG: BTgymMultiDataShell_0: FORCE CONTROL MODE attempt: 1.
Response: {'ctrl': 'send control keys: <_reset>, <_getstat>, <_render>, <_stop>.'}
[2019-01-31 16:45:08.094787] DEBUG: BTgymServer_0: Control mode: received <{'ctrl': '_reset', 'kwargs': {'trial_config': {'b_beta': 1.0, 'get_new': True, 'sample_type': 0, 'b_alpha': 1.0}, 'episode_config': {'b_beta': 1.0, 'get_new': True, 'sample_type': 0, 'b_alpha': 1.0}}}>
[2019-01-31 16:45:08.094849] DEBUG: BTgymServer_0: Preparing new episode with kwargs: {'trial_config': {'b_beta': 1.0, 'get_new': True, 'sample_type': 0, 'b_alpha': 1.0}, 'episode_config': {'b_beta': 1.0, 'get_new': True, 'sample_type': 0, 'b_alpha': 1.0}}
[2019-01-31 16:45:08.096263] DEBUG: BTgymDataServer_0: Received <{'ctrl': '_get_broadcast_message'}>
[2019-01-31 16:45:08.096502] INFO: BTgymServer_0: Requesting new Trial sample with args: {'broadcast_message': None, 'sample_type': 0, 'timestamp': None, 'b_beta': 1.0, 'get_new': True, 'b_alpha': 1.0}
[2019-01-31 16:45:08.096677] DEBUG: BTgymDataServer_0: Received <{'ctrl': '_get_data', 'kwargs': {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': None, 'b_alpha': 1.0}}>
[2019-01-31 16:45:08.096753] DEBUG: BTgymDataServer_0: Sampling with params: {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': 1483228800.0, 'b_alpha': 1.0}
[2019-01-31 16:45:08.096796] DEBUG: multi_data_0: Making master sample...
[2019-01-31 16:45:08.096860] DEBUG: multi_data_resample_1_0: Sample interval: [0, 10000]
[2019-01-31 16:45:08.096899] DEBUG: multi_data_resample_1_0: Maximum sample time duration set to: 5 days, 23:40:00.
[2019-01-31 16:45:08.096934] DEBUG: multi_data_resample_1_0: Sample number of steps (adjusted to interval): 8620.
[2019-01-31 16:45:08.096967] DEBUG: multi_data_resample_1_0: Maximum allowed data time gap set to: 10:00:00.
[2019-01-31 16:45:08.097241] DEBUG: multi_data_resample_1_0: Sample start row: 552, day: 2017-01-01 09:12:00, weekday: 6.
[2019-01-31 16:45:08.097391] DEBUG: multi_data_resample_1_0: first_row: 552, last_row: 9172, data_shape: (8620, 5)
[2019-01-31 16:45:08.097507] DEBUG: multi_data_resample_1_0: Actual sample duration: 5 days, 23:39:00.
[2019-01-31 16:45:08.097553] DEBUG: multi_data_resample_1_0: Total sample time gap: 0:01:00.
[2019-01-31 16:45:08.097591] DEBUG: multi_data_resample_1_0: Sample accepted.
[2019-01-31 16:45:08.097768] INFO: multi_data_resample_1_0: New sample id: <train_trial_w_0_num_0_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.097817] DEBUG: multi_data_0: Making master ok.
[2019-01-31 16:45:08.097918] DEBUG: multi_data_0: Sampling <resample_1> with interval: [552, 9172], kwargs: {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': 1483228800.0, 'b_alpha': 1.0}
[2019-01-31 16:45:08.098272] INFO: multi_data_resample_1_0: New sample id: <train_trial_w_0_num_1_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.098332] DEBUG: multi_data_0: Sampling <resample_2> with interval: [552, 9172], kwargs: {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': 1483228800.0, 'b_alpha': 1.0}
[2019-01-31 16:45:08.098647] INFO: multi_data_resample_2_0: New sample id: <train_trial_w_0_num_0_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.098706] DEBUG: multi_data_0: Sampling <base> with interval: [552, 9172], kwargs: {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': 1483228800.0, 'b_alpha': 1.0}
[2019-01-31 16:45:08.099022] INFO: multi_data_base_0: New sample id: <train_trial_w_0_num_0_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.099101] DEBUG: multi_data_0: Sampling <resample_3> with interval: [552, 9172], kwargs: {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': 1483228800.0, 'b_alpha': 1.0}
[2019-01-31 16:45:08.099426] INFO: multi_data_resample_3_0: New sample id: <train_trial_w_0_num_0_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.099488] DEBUG: BTgymDataServer_0: Sending sample_#1.
[2019-01-31 16:45:08.105490] DEBUG: BTgymServer_0: Data_server @tcp://127.0.0.1:4999 responded in ~0.008882 seconds.
[2019-01-31 16:45:08.168215] DEBUG: BTgymServer_0: Got new Trial: <{'resample_1': ['/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'], 'resample_2': ['/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'], 'base': ['/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'], 'resample_3': ['/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv']}>
[2019-01-31 16:45:08.168331] DEBUG: BTgymServer_0: current global_time: 2017-01-01 02:00:00
[2019-01-31 16:45:08.168387] INFO: BTgymServer_0: Requesting episode from <{'resample_1': ['/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'], 'resample_2': ['/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'], 'base': ['/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'], 'resample_3': ['/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv']}> with args: {'broadcast_message': None, 'sample_type': 0, 'timestamp': 1483228800.0, 'b_beta': 1.0, 'get_new': True, 'b_alpha': 1.0}
[2019-01-31 16:45:08.168427] DEBUG: sub_multi_data_0: Making master sample...
[2019-01-31 16:45:08.168489] DEBUG: Trial_0: Sample interval: [0, 8620]
[2019-01-31 16:45:08.168525] DEBUG: Trial_0: Maximum sample time duration set to: 5 days, 23:40:00.
[2019-01-31 16:45:08.168559] DEBUG: Trial_0: Sample number of steps (adjusted to interval): 8620.
[2019-01-31 16:45:08.168591] DEBUG: Trial_0: Maximum allowed data time gap set to: 10:00:00.
[2019-01-31 16:45:08.168840] DEBUG: Trial_0: Sample start row: 0, day: 2017-01-01 09:12:00, weekday: 6.
[2019-01-31 16:45:08.168995] DEBUG: Trial_0: first_row: 0, last_row: 8620, data_shape: (8620, 5)
[2019-01-31 16:45:08.169100] DEBUG: Trial_0: Actual sample duration: 5 days, 23:39:00.
[2019-01-31 16:45:08.169140] DEBUG: Trial_0: Total sample time gap: 0:01:00.
[2019-01-31 16:45:08.169171] DEBUG: Trial_0: Sample accepted.
[2019-01-31 16:45:08.169306] INFO: Trial_0: New sample id: <train_episode_w_0_num_0_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.169353] DEBUG: sub_multi_data_0: Making master ok.
[2019-01-31 16:45:08.169428] DEBUG: sub_multi_data_0: Sampling <resample_1> with interval: [0, 8620], kwargs: {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': 1483228800.0, 'b_alpha': 1.0}
[2019-01-31 16:45:08.169729] INFO: Trial_0: New sample id: <train_episode_w_0_num_1_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.169806] DEBUG: sub_multi_data_0: Sampling <resample_2> with interval: [0, 8620], kwargs: {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': 1483228800.0, 'b_alpha': 1.0}
[2019-01-31 16:45:08.170106] INFO: Trial_0: New sample id: <train_episode_w_0_num_0_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.170165] DEBUG: sub_multi_data_0: Sampling <base> with interval: [0, 8620], kwargs: {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': 1483228800.0, 'b_alpha': 1.0}
[2019-01-31 16:45:08.170451] INFO: Trial_0: New sample id: <train_episode_w_0_num_0_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.170510] DEBUG: sub_multi_data_0: Sampling <resample_3> with interval: [0, 8620], kwargs: {'broadcast_message': None, 'sample_type': 0, 'b_beta': 1.0, 'get_new': True, 'timestamp': 1483228800.0, 'b_alpha': 1.0}
[2019-01-31 16:45:08.170794] INFO: Trial_0: New sample id: <train_episode_w_0_num_0_at_2017-01-01 09:12:00>.
[2019-01-31 16:45:08.170878] DEBUG: BTgymServer_0: Got new Episode: <{'resample_1': 'train_trial_w_0_num_1_at_2017-01-01 09:12:00', 'resample_2': 'train_trial_w_0_num_0_at_2017-01-01 09:12:00', 'base': 'train_trial_w_0_num_0_at_2017-01-01 09:12:00', 'resample_3': 'train_trial_w_0_num_0_at_2017-01-01 09:12:00'}>
[2019-01-31 16:45:08.233773] DEBUG: BTgymServer_0: strategy.metadata: {'sample_num': array(0), 'trial_type': array(0), 'type': array(0), 'trial_num': array(1), 'timestamp': array(0.), 'first_row': array(0)}
[2019-01-31 16:45:08.233870] DEBUG: BTgymServer_0: is_test: 0
[2019-01-31 16:45:08.240457] DEBUG: BTgymServer_0: data_name: resample_1
[2019-01-31 16:45:08.240543] DEBUG: BTgymServer_0: data_name: resample_2
[2019-01-31 16:45:08.240581] DEBUG: BTgymServer_0: data_name: base
[2019-01-31 16:45:08.240614] DEBUG: BTgymServer_0: data_name: resample_3
[2019-01-31 16:45:08.240648] DEBUG: BTgymServer_0: stake size: {'base': 2000}
[2019-01-31 16:45:08.240720] DEBUG: BTgymServer_0: startegy got broadcast_msg: <<None>>
[2019-01-31 16:45:08.244116] NOTICE: Worker_0: started training at step: 0
[2019-01-31 16:45:17.966108] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'hold'}}
[2019-01-31 16:45:17.971910] DEBUG: BTgymMultiDataShell_0: Response checker received:
({'stat': array([[1.09997529e+00],
[4.89585598e-07]]), 'expert': array([0., 0., 0., 0.]), 'external': array([[[-10. , -10. , -0.13760065, ..., 0.05112412,
0.0664598 , 0.09049345]],
[[-10. , -10. , -0.07114086, ..., 0.04835828,
0.05550078, 0.07352513]],
[[-10. , -10. , -0.0265991 , ..., 0.04234791,
0.03252218, 0.03676027]],
...,
[[-10. , -10. , -0.42818706, ..., 0.04366827,
0.11064923, 0.15553795]],
[[-10. , -10. , -0.32001243, ..., 0.03984178,
0.10357895, 0.14776131]],
[[-10. , -10. , -0.22102916, ..., 0.03772047,
0.09898327, 0.14139836]]]), 'internal': array([[[0. , 0. , 0. , 0.36643772, 0. ]],
[[0. , 0. , 0. , 0.36932305, 0. ]],
[[0. , 0. , 0. , 0.37223111, 0. ]],
[[0. , 0. , 0. , 0.37516206, 0. ]],
[[0. , 0. , 0. , 0.37811609, 0. ]],
[[0. , 0. , 0. , 0.38109339, 0. ]],
...
[[0. , 0. , 0. , 0.93184518, 0. ]],
[[0. , 0. , 0. , 0.93918254, 0. ]],
[[0. , 0. , 0. , 0.94657768, 0. ]],
[[0. , 0. , 0. , 0.95403105, 0. ]],
[[0. , 0. , 0. , 0.9615431 , 0. ]],
[[0. , 0. , 0. , 0.96911431, 0. ]],
[[0. , 0. , 0. , 0.97674513, 0. ]],
[[0. , 0. , 0. , 0.98443604, 0. ]],
[[0. , 0. , 0. , 0.9921875 , 0. ]]]), 'metadata': {'sample_num': array(0), 'trial_type': array(0), 'generator': {'sigma': array(0), 'x0': array(0), 'mu': array(0), 'l': array(0)}, 'type': array(0), 'trial_num': array(1), 'timestamp': array(1.48374612e+09), 'first_row': array(0)}}, 0.0, False, [{'broker_value': 2000.0, 'max_drawdown': 0.0, 'step': 0, 'action': {'_skip_this': True, 'base': 'hold'}, 'broker_cash': 2000.0, 'broker_message': '_', 'time': datetime.datetime(2017, 1, 7, 1, 42), 'drawdown': 0.0}])
as type: <class 'tuple'>
[2019-01-31 16:45:18.509696] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'sell'}}
[2019-01-31 16:45:18.530297] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'sell'}}
[2019-01-31 16:45:18.549330] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'buy'}}
[2019-01-31 16:45:18.568778] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'sell'}}
[2019-01-31 16:45:18.587871] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'close'}}
[2019-01-31 16:45:18.607886] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'close'}}
...
[2019-01-31 16:45:19.637716] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'close'}}
[2019-01-31 16:45:19.653120] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'sell'}}
[2019-01-31 16:45:19.669474] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'close'}}
[2019-01-31 16:45:19.686234] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'sell'}}
[2019-01-31 16:45:20.684779] DEBUG: BTgymServer_0: Episode elapsed time: 0:00:12.589767.
[2019-01-31 16:45:20.754318] DEBUG: BTgymServer_0: Control mode: received <{'action': {'base': 'close'}}>
[2019-01-31 16:45:20.754419] DEBUG: BTgymServer_0: No <ctrl> key received:Control mode: received <{'action': {'base': 'close'}}>
Hint: forgot to call reset()?
[2019-01-31 16:45:20.754751] ERROR: ThreadRunner_0: too many values to unpack (expected 4)
Traceback (most recent call last):
File "/home/jack/btgym/btgym/algorithms/runner/base.py", line 121, in BaseEnvRunnerFn
state, reward, terminal, info = env.step(action['environment'])
ValueError: too many values to unpack (expected 4)
[2019-01-31 16:45:20.754971] ERROR: ThreadRunner_0: RunTime exception occurred.
Press `Ctrl-C` or jupyter:[Kernel]->[Interrupt] for clean exit.
Traceback (most recent call last):
File "/home/jack/btgym/btgym/algorithms/runner/threadrunner.py", line 92, in run
self._run()
File "/home/jack/btgym/btgym/algorithms/runner/threadrunner.py", line 119, in _run
self.queue.put(next(rollout_provider), timeout=600.0)
File "/home/jack/btgym/btgym/algorithms/runner/base.py", line 315, in BaseEnvRunnerFn
raise e
File "/home/jack/btgym/btgym/algorithms/runner/base.py", line 121, in BaseEnvRunnerFn
state, reward, terminal, info = env.step(action['environment'])
ValueError: too many values to unpack (expected 4)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/jack/btgym/btgym/algorithms/runner/threadrunner.py", line 92, in run
self._run()
File "/home/jack/btgym/btgym/algorithms/runner/threadrunner.py", line 119, in _run
self.queue.put(next(rollout_provider), timeout=600.0)
File "/home/jack/btgym/btgym/algorithms/runner/base.py", line 315, in BaseEnvRunnerFn
raise e
File "/home/jack/btgym/btgym/algorithms/runner/base.py", line 121, in BaseEnvRunnerFn
state, reward, terminal, info = env.step(action['environment'])
ValueError: too many values to unpack (expected 4)
Error 3 - third common error:
/usr/bin/python3.5 /home/jack/btgym/PlayGround.py
[2019-02-01 20:28:04.294105] NOTICE: LauncherShell: writing to: /home/jack/tmp/test_4_11_multi8.
2019-02-01 22:28:05.573047: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-01 22:28:05.653369: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-01 22:28:05.653710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 5.38GiB
2019-02-01 22:28:05.653722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-01 22:28:05.864747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-01 22:28:05.864775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-02-01 22:28:05.864780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-02-01 22:28:05.864914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:worker/replica:0/task:0/device:GPU:0 with 5149 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-02-01 22:28:05.866293: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12230}
2019-02-01 22:28:05.866303: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12231}
2019-02-01 22:28:05.866981: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:12231
[2019-02-01 20:28:08.451981] NOTICE: PPO_0: learn_rate: 0.000100, entropy_beta: 0.050000
[2019-02-01 20:28:08.452368] NOTICE: PPO_0: vr_lambda: 1.000000, pc_lambda: 1.000000, rp_lambda: 1.000000
2019-02-01 22:28:10.581667: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
********************************************************************************************
** Press `Ctrl-C` or jupyter:[Kernel]->[Interrupt] to stop training and close launcher. **
********************************************************************************************
2019-02-01 22:28:10.645448: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-01 22:28:10.645802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 5.24GiB
2019-02-01 22:28:10.645814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-01 22:28:10.884159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-01 22:28:10.884202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-02-01 22:28:10.884207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-02-01 22:28:10.884342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:ps/replica:0/task:0/device:GPU:0 with 5002 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-02-01 22:28:10.885682: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:12230}
2019-02-01 22:28:10.885699: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:12231}
2019-02-01 22:28:10.886452: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:12230
2019-02-01 22:28:24.018754: I tensorflow/core/distributed_runtime/master_session.cc:1161] Start master session 7e1d41dc19b9fb94 with config: intra_op_parallelism_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0" inter_op_parallelism_threads: 2
[2019-02-01 20:28:25.058591] NOTICE: Worker_0: initializing all parameters...
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[2019-02-01 20:28:26.905563] NOTICE: Worker_0: no saved model parameters found in:
/home/jack/tmp/pre_trained_model4
INFO:tensorflow:Restoring parameters from /home/jack/tmp/test_4_11_multi8/current_train_checkpoint/model_parameters-876573
[2019-02-01 20:28:30.476084] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-02-01 20:28:30.647008] NOTICE: Worker_0: started training at step: 876573
[2019-02-01 20:28:40.037251] ERROR: BTgymMultiDataShell_0:
State observation shape/range mismatch!
Space set by env:
expert:
Box(4,), low: 0.0, high: 10.0
external:
Box(128, 1, 8), low: -10.0, high: 10.0
internal:
Box(128, 1, 5), low: -100.0, high: 100.0
metadata:
first_row,
generator,
sample_num,
timestamp,
trial_num,
trial_type,
type,
stat:
Box(2, 1), low: -100.0, high: 100.0
Space returned by server:
metadata:
timestamp:
array of shape: (), low: 1483758480.0, high: 1483758480.0
first_row:
array of shape: (), low: 0, high: 0
sample_num:
array of shape: (), low: 0, high: 0
trial_type:
array of shape: (), low: 0, high: 0
generator:
mu:
array of shape: (), low: 0, high: 0
x0:
array of shape: (), low: 0, high: 0
sigma:
array of shape: (), low: 0, high: 0
l:
array of shape: (), low: 0, high: 0
trial_num:
array of shape: (), low: 1, high: 1
type:
array of shape: (), low: 0, high: 0
external:
array of shape: (128, 1, 8), low: -10.0, high: 2.264685579154275
stat:
array of shape: (2, 1), low: nan, high: nan
expert:
array of shape: (4,), low: 0.0, high: 0.0
internal:
array of shape: (128, 1, 5), low: nan, high: nan
Full response:
{'metadata': {'timestamp': array(1.48375848e+09), 'first_row': array(0), 'sample_num': array(0), 'trial_type': array(0), 'generator': {'mu': array(0), 'x0': array(0), 'sigma': array(0), 'l': array(0)}, 'trial_num': array(1), 'type': array(0)}, 'external': array([[[-10. , -10. , -0.1368235 , ..., 0.10166982,
0.09968921, 0.1314966 ]],
[[-10. , -10. , -0.03713429, ..., 0.10202337,
0.10216384, 0.13573876]],
[[-10. , -10. , 0.06750418, ..., 0.10308403,
0.10357788, 0.13715264]],
...,
[[-10. , -10. , -0.36306664, ..., 0.02595577,
0.04736862, 0.04170456]],
[[-10. , -10. , -0.30438542, ..., 0.02878736,
0.06787268, 0.07634876]],
[[-10. , -10. , -0.22732128, ..., 0.0298512 ,
0.07706413, 0.09190327]]]), 'stat': array([[nan],
[nan]]), 'expert': array([0., 0., 0., 0.]), 'internal': array([[[0. , nan, 0. , 0.36643772, nan]],
[[0. , nan, 0. , 0.36932305, nan]],
[[0. , nan, 0. , 0.37223111, nan]],
[[0. , nan, 0. , 0.37516206, nan]],
[[0. , nan, 0. , 0.37811609, nan]],
[[0. , nan, 0. , 0.38109339, nan]],
[[0. , nan, 0. , 0.38409412, nan]],
[[0. , nan, 0. , 0.38711848, nan]],
[[0. , nan, 0. , 0.39016666, nan]],
...
[[0. , nan, 0. , 0.91734197, nan]],
[[0. , nan, 0. , 0.92456514, nan]],
[[0. , nan, 0. , 0.93184518, nan]],
[[0. , nan, 0. , 0.93918254, nan]],
[[0. , nan, 0. , 0.94657768, nan]],
[[0. , nan, 0. , 0.95403105, nan]],
[[0. , nan, 0. , 0.9615431 , nan]],
[[0. , nan, 0. , 0.96911431, nan]],
[[0. , nan, 0. , 0.97674513, nan]],
[[0. , nan, 0. , 0.98443604, nan]],
[[0. , nan, 0. , 0.9921875 , nan]]])}
Reward: 0.0
Done: False
Info:
{'drawdown': 0.0, 'action': {'base': 'hold', '_skip_this': True}, 'time': datetime.datetime(2017, 1, 7, 5, 8), 'broker_value': 2000.0, 'broker_cash': 2000.0, 'max_drawdown': 0.0, 'step': 10, 'broker_message': '-'}
Hint: Wrong Strategy.get_state() parameters?
Traceback (most recent call last):
File "/home/jack/btgym/btgym/envs/base.py", line 731, in reset
assert self.observation_space.contains(self.env_response[0])
AssertionError
Error 4 - not common error
2019-01-31 19:24:42.828709: I tensorflow/core/distributed_runtime/master_session.cc:1161] Start master session 140ffc7acc711462 with config: intra_op_parallelism_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0" inter_op_parallelism_threads: 2
[2019-01-31 17:24:44.084367] NOTICE: Worker_0: initializing all parameters...
INFO:tensorflow:Running local_init_op.
[2019-01-31 17:24:46.143749] NOTICE: Worker_0: no saved model parameters found in:
/home/jack/tmp/pre_trained_model4
INFO:tensorflow:Done running local_init_op.
[2019-01-31 17:24:46.306394] NOTICE: Worker_0: no saved model parameters found in:
/home/jack/tmp/test_4_11_multi8/current_train_checkpoint
[2019-01-31 17:24:46.306498] NOTICE: Worker_0: training from scratch...
[2019-01-31 17:24:49.954584] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-01-31 17:24:50.132435] NOTICE: Worker_0: started training at step: 0
[2019-01-31 17:34:50.132983] ERROR: PPO_0: process() exception occurred
Press `Ctrl-C` or jupyter:[Kernel]->[Interrupt] for clean exit.
Traceback (most recent call last):
File "/home/jack/btgym/btgym/algorithms/aac.py", line 1367, in _process
data = self.get_data()
File "/home/jack/btgym/btgym/algorithms/aac.py", line 917, in get_data
data_streams = [get_it(**kwargs) for get_it in self.data_getter]
File "/home/jack/btgym/btgym/algorithms/aac.py", line 917, in <listcomp>
data_streams = [get_it(**kwargs) for get_it in self.data_getter]
File "/home/jack/btgym/btgym/algorithms/rollout.py", line 33, in pull_rollout_from_queue
return queue.get(timeout=600.0)
File "/usr/lib/python3.5/queue.py", line 172, in get
raise Empty
queue.Empty
Error 5 - this error only started to happen after latest commits (not common):
[2019-02-01 20:25:03.814984] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-02-01 20:25:03.965311] NOTICE: Worker_0: started training at step: 876573
Process BTgymServer-1:2:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/jack/btgym/btgym/server.py", line 548, in run
service_input = self.socket.recv_pyobj()
File "/usr/local/lib/python3.5/dist-packages/zmq/sugar/socket.py", line 623, in recv_pyobj
msg = self.recv(flags)
File "zmq/backend/cython/socket.pyx", line 788, in zmq.backend.cython.socket.Socket.recv
File "zmq/backend/cython/socket.pyx", line 824, in zmq.backend.cython.socket.Socket.recv
File "zmq/backend/cython/socket.pyx", line 191, in zmq.backend.cython.socket._recv_copy
File "zmq/backend/cython/socket.pyx", line 186, in zmq.backend.cython.socket._recv_copy
File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Operation cannot be accomplished in current state
@JacobHanouna, just a quick response: error 3 caused by NaNs in environment response. From the observation state shape I can see that you have used strategy from model_based module. Those are under heavy development and are constantly changing. This time It can be zero devision in normalisation constant. I have removed it yesterday but thot 100 sure entire normalisation approach is correct;
Those are under heavy development and are constantly changing
I'm aware of it, this is why I'm retesting stuff after the commits were made.
so maybe the errors had slightly changed but the underling problem is still there, as I observed them prior to those commits (they occur also in CasualConvStrategyMulti
which is not based on model_based
)
as a poor workaround I just put the whole training routine in a infinite try-catch loop, cause once it start working properly, usually, I don't get errors. (they are all happening on the same code and same data)
Error 1 Here:
[2019-01-31 16:45:19.686234] DEBUG: BTgymServer_0: COMM received: {'action': {'base': 'sell'}}
[2019-01-31 16:45:20.684779] DEBUG: BTgymServer_0: Episode elapsed time: 0:00:12.589767.
[2019-01-31 16:45:20.754318] DEBUG: BTgymServer_0: Control mode: received <{'action': {'base': 'close'}}>
[2019-01-31 16:45:20.754419] DEBUG: BTgymServer_0: No <ctrl> key received:Control mode: received <{'action': {'base': 'close'}}>
Hint: forgot to call reset()?
[2019-01-31 16:45:20.754751] ERROR: ThreadRunner_0: too many values to unpack (expected 4)
so, episode ends, BTGym env. server falls back to control mode and expect requests containing control keys, but thread runner somehow misses 'done' (terminal) flag in environment response and keeps calling env.step(), which passes (via api shell) actions to server, instead of calling env. reset() to initiate new episode; at next moment thread runner gets control response while expecting full o,r,d,i, is unable to unpack it and everybody is unhappy. What a mess - 'done' is missed. Would you insert log.warning
outputs at critical points to output env response ? Can verbose=0 to eliminate output excess. Look for done key at episode termination.
https://github.com/Kismuz/btgym/blob/master/btgym/algorithms/runner/base.py#L52 https://github.com/Kismuz/btgym/blob/master/btgym/algorithms/runner/base.py#L86 https://github.com/Kismuz/btgym/blob/master/btgym/algorithms/runner/base.py#L229
@Kismuz I have placed a log.warning
at the places you asked for (they work on non error run)
but error 1 and 3 happens before the log is ever reached
ok, let's monitor it from shell side: https://github.com/Kismuz/btgym/blob/master/btgym/envs/base.py#L723
for case error 1:
[2019-02-02 09:09:34.708799] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-02-02 09:09:34.875497] NOTICE: Worker_0: started training at step: 937098
[2019-02-02 09:09:45.905509] WARNING: BTgymMultiDataShell_0: line 723 env_response: No <ctrl> key received:Control mode: received <{'action': {'base': 'hold'}}>
Hint: forgot to call reset()?
[2019-02-02 09:09:45.905610] ERROR: BTgymMultiDataShell_0: Unexpected environment response: No <ctrl> key received:Control mode: received <{'action': {'base': 'hold'}}>
Hint: forgot to call reset()?
Hint: Forgot to call reset() or reset_data()?
Traceback (most recent call last):
File "/home/jack/btgym/btgym/envs/base.py", line 612, in _assert_response
assert type(response) == tuple and len(response) == 4
AssertionError
for case error 3:
[2019-02-02 08:58:41.676624] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-02-02 08:58:41.830278] NOTICE: Worker_0: started training at step: 937098
[2019-02-02 08:58:51.271837] WARNING: BTgymMultiDataShell_0: line 723 env_response: ({'expert': array([0., 0., 0., 0.]), 'internal': array([[[0. , nan, 0. , 0.36643772, nan]],
[[0. , nan, 0. , 0.36932305, nan]],
[[0. , nan, 0. , 0.37223111, nan]],
[[0. , nan, 0. , 0.37516206, nan]],
[[0. , nan, 0. , 0.37811609, nan]],
...
...
#####same as original error 3####
[2019-02-02 08:58:51.284912] ERROR: BTgymMultiDataShell_0:
State observation shape/range mismatch!
Space set by env:
...
...
@JacobHanouna, no that does not bring any clarity. Do you have any progress with issues?
@JacobHanouna,
to get clear picture how episode countdown is processed, temporarily enable log output by changing debug
to warning
here:
https://github.com/Kismuz/btgym/blob/master/btgym/research/strategy_gen_6/base.py#L914
https://github.com/Kismuz/btgym/blob/master/btgym/research/strategy_gen_6/base.py#L924
...if no proper countdown is shown, add following output to method body:
self.log.warning(
'iteration: {}, condition: {}'.format(
self.iteration,
self.data.numrecords - self.inner_embedding - self.p.skip_frame - self.steps_till_is_done
)
)
I used the log.warning at the same location as: (cause the location above didn't got triggered ) https://github.com/Kismuz/btgym/blob/master/btgym/research/strategy_gen_6/base.py#L905
and uncommented: https://github.com/Kismuz/btgym/blob/master/btgym/research/strategy_gen_6/base.py#L393
for error 1 no new information.
for error 3:
[2019-02-03 17:13:05.686381] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-02-03 17:13:05.836108] NOTICE: Worker_0: started training at step: 0
[2019-02-03 17:13:17.715686] WARNING: BTgymServer_0: Inner time embedding: 4096
[2019-02-03 17:13:17.715775] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:13:17.715815] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:13:17.715849] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:13:17.715880] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:13:17.715910] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:13:17.715938] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:13:17.715966] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:13:17.715994] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:13:17.716022] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:13:17.716055] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:13:17.716114] WARNING: BTgymServer_0: iteration: 0, condition: 4512, numrecords 8620, inner_embedding 4096
[2019-02-03 17:13:17.717174] WARNING: BTgymServer_0: Inner time embedding: 4097
[2019-02-03 17:13:17.717225] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:13:17.717261] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:13:17.717293] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:13:17.717323] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:13:17.717351] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:13:17.717379] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:13:17.717407] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:13:17.717435] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:13:17.717462] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:13:17.717490] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:13:17.717539] WARNING: BTgymServer_0: iteration: 1, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.718862] WARNING: BTgymServer_0: iteration: 2, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.720571] WARNING: BTgymServer_0: iteration: 3, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.721927] WARNING: BTgymServer_0: iteration: 4, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.723433] WARNING: BTgymServer_0: iteration: 5, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.724787] WARNING: BTgymServer_0: iteration: 6, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.727053] WARNING: BTgymServer_0: iteration: 7, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.728387] WARNING: BTgymServer_0: iteration: 8, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.729863] WARNING: BTgymServer_0: iteration: 9, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.731209] WARNING: BTgymServer_0: iteration: 10, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:13:17.738648] WARNING: BTgymMultiDataShell_0: line 724 env_response: ({'internal': array([[[0. , nan, 0. , 0.36643772, nan]],
[[0. , nan, 0. , 0.36932305, nan]],
[[0. , nan, 0. , 0.37223111, nan]],
to compare this is a normal run:
[2019-02-03 17:31:18.200646] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-02-03 17:31:18.371415] NOTICE: Worker_0: started training at step: 0
[2019-02-03 17:31:24.143993] WARNING: BTgymServer_0: Inner time embedding: 4097
[2019-02-03 17:31:24.144112] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.144170] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.144217] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.144259] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.144300] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.144340] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.144379] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.144418] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.144457] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.144496] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.144574] WARNING: BTgymServer_0: iteration: 0, condition: 4511, numrecords 8620, inner_embedding 4097
[2019-02-03 17:31:24.145887] WARNING: BTgymServer_0: Inner time embedding: 4098
[2019-02-03 17:31:24.145957] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.146009] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.146055] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.146097] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.146138] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.146179] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.146219] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.146259] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.146298] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.146337] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.146409] WARNING: BTgymServer_0: iteration: 1, condition: 4510, numrecords 8620, inner_embedding 4098
[2019-02-03 17:31:24.147851] WARNING: BTgymServer_0: Inner time embedding: 4099
[2019-02-03 17:31:24.147926] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.147977] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.148022] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.148077] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.148130] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.148179] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.148227] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.148274] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.148321] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.148368] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.148451] WARNING: BTgymServer_0: iteration: 2, condition: 4509, numrecords 8620, inner_embedding 4099
[2019-02-03 17:31:24.149805] WARNING: BTgymServer_0: Inner time embedding: 4100
[2019-02-03 17:31:24.149873] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.149924] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.149969] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.150012] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.150054] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.150094] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.150134] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.150175] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.150214] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.150254] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.150326] WARNING: BTgymServer_0: iteration: 3, condition: 4508, numrecords 8620, inner_embedding 4100
[2019-02-03 17:31:24.152169] WARNING: BTgymServer_0: Inner time embedding: 4101
[2019-02-03 17:31:24.152238] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.152289] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.152335] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.152377] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.152419] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.152459] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.152499] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.152539] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.152579] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.152619] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.152691] WARNING: BTgymServer_0: iteration: 4, condition: 4507, numrecords 8620, inner_embedding 4101
[2019-02-03 17:31:24.153942] WARNING: BTgymServer_0: Inner time embedding: 4102
[2019-02-03 17:31:24.154009] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.154060] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.154104] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.154146] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.154187] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.154227] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.154267] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.154306] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.154346] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.154384] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.154455] WARNING: BTgymServer_0: iteration: 5, condition: 4506, numrecords 8620, inner_embedding 4102
[2019-02-03 17:31:24.155877] WARNING: BTgymServer_0: Inner time embedding: 4103
[2019-02-03 17:31:24.155946] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.155997] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.156043] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.156093] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.156134] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.156175] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.156214] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.156253] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.156292] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.156330] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.156401] WARNING: BTgymServer_0: iteration: 6, condition: 4505, numrecords 8620, inner_embedding 4103
[2019-02-03 17:31:24.157752] WARNING: BTgymServer_0: Inner time embedding: 4104
[2019-02-03 17:31:24.157819] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.157868] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.157912] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.157954] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.157994] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.158034] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.158073] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.158113] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.158151] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.158190] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.158261] WARNING: BTgymServer_0: iteration: 7, condition: 4504, numrecords 8620, inner_embedding 4104
[2019-02-03 17:31:24.162372] WARNING: BTgymServer_0: Inner time embedding: 4105
[2019-02-03 17:31:24.162441] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.162493] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.162538] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.162579] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.162620] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.162660] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.162699] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.162739] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.162778] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.162817] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.162889] WARNING: BTgymServer_0: iteration: 8, condition: 4503, numrecords 8620, inner_embedding 4105
[2019-02-03 17:31:24.164155] WARNING: BTgymServer_0: Inner time embedding: 4106
[2019-02-03 17:31:24.164222] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.164272] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.164316] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.164357] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.164397] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.164437] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.164477] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.164516] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.164555] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.164595] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.164666] WARNING: BTgymServer_0: iteration: 9, condition: 4502, numrecords 8620, inner_embedding 4106
[2019-02-03 17:31:24.166082] WARNING: BTgymServer_0: Inner time embedding: 4107
[2019-02-03 17:31:24.166149] WARNING: BTgymServer_0: realized_pnl: 128
[2019-02-03 17:31:24.166200] WARNING: BTgymServer_0: min_unrealized_pnl: 128
[2019-02-03 17:31:24.166245] WARNING: BTgymServer_0: max_unrealized_pnl: 128
[2019-02-03 17:31:24.166287] WARNING: BTgymServer_0: unrealized_pnl: 128
[2019-02-03 17:31:24.166328] WARNING: BTgymServer_0: value: 128
[2019-02-03 17:31:24.166368] WARNING: BTgymServer_0: exposure: 128
[2019-02-03 17:31:24.166408] WARNING: BTgymServer_0: total_unrealized_pnl: 128
[2019-02-03 17:31:24.166448] WARNING: BTgymServer_0: cash: 128
[2019-02-03 17:31:24.166487] WARNING: BTgymServer_0: drawdown: 128
[2019-02-03 17:31:24.166527] WARNING: BTgymServer_0: pos_duration: 128
[2019-02-03 17:31:24.166598] WARNING: BTgymServer_0: iteration: 10, condition: 4501, numrecords 8620, inner_embedding 4107
[2019-02-03 17:31:24.174761] WARNING: BTgymMultiDataShell_0: line 724 env_response: ({'expert': array([0., 0., 0., 0.]), 'metadata': {'timestamp': array(1.48350114e+09), 'trial_num': array(1), 'trial_type': array(0), 'type': array(0), 'sample_num': array(0), 'first_row': array(0), 'generator': {'l': array(0), 'mu': array(0), 'x0': array(0), 'sigma': array(0)}}, 'internal': array([[[0. , 0. , 0. , 0.36643772, 0. ]],
[[0. , 0. , 0. , 0.36932305, 0. ]],
[[0. , 0. , 0. , 0.37223111, 0. ]],
[[0. , 0. , 0. , 0.37516206, 0. ]],
[[0. , 0. , 0. , 0.37811609, 0. ]],
problem might be connected with inner_embedding?
does errors persists if you use filters with gen5 or gen4 strategies?
yes, I don't think it is a regression problem
@Kismuz I updated the code to the latest and now the only error I keep getting is Error 4.
Traceback (most recent call last):
File "/home/jack/btgym/btgym/algorithms/aac.py", line 1367, in _process
data = self.get_data()
File "/home/jack/btgym/btgym/algorithms/aac.py", line 917, in get_data
data_streams = [get_it(**kwargs) for get_it in self.data_getter]
File "/home/jack/btgym/btgym/algorithms/aac.py", line 917, in <listcomp>
data_streams = [get_it(**kwargs) for get_it in self.data_getter]
File "/home/jack/btgym/btgym/algorithms/rollout.py", line 33, in pull_rollout_from_queue
return queue.get(timeout=600.0)
File "/usr/lib/python3.5/queue.py", line 172, in get
raise Empty
queue.Empty
so, episode ends, BTGym env. server falls back to control mode and expect requests containing control keys, but thread runner somehow misses 'done' (terminal) flag in environment response and keeps calling env.step(), which passes (via api shell) actions to server, instead of calling env. reset() to initiate new episode; at next moment thread runner gets control response while expecting full o,r,d,i, is unable to unpack it and everybody is unhappy. What a mess - 'done' is missed
the more I think about it the more it looks like the threads are not in sync with one another for the first iteration. maybe 'filtering' is just a bit more heavy in calculation so it take a bit more time somewhere to cause that. this can also explain why I got different errors and why more filtering operation made it more likely to crash.
I can maybe put a sleep
delay in the entry point of the thread and see if it is the case.
if it make sense, where should I place it?
empty queue... we've seen this before: #30
...after failing to properly resolve the issue i have wrote synchronous well-behaved version of runner:
https://github.com/Kismuz/btgym/blob/master/btgym/algorithms/runner/synchro.py
substitution is quite straightforward, you can find latest example usage here:
https://github.com/Kismuz/btgym/blob/master/btgym/research/model_based/aac.py
@Kismuz I've change the runner and retested. I got the following error in the process:
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[2019-02-05 08:00:25.477223] NOTICE: Worker_0: no saved model parameters found in:
/home/jack/tmp/pre_trained_model4
INFO:tensorflow:Restoring parameters from /home/jack/tmp/test_4_11_multi10/current_train_checkpoint/model_parameters-41099
[2019-02-05 08:00:29.295327] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
[2019-02-05 08:00:36.389892] ERROR: PPO_0: start() exception occurred
Press `Ctrl-C` or jupyter:[Kernel]->[Interrupt] for clean exit.
Traceback (most recent call last):
File "/home/jack/btgym/btgym/algorithms/aac.py", line 994, in start
self._start_runners(sess, summary_writer, **kwargs)
File "/home/jack/btgym/btgym/algorithms/aac.py", line 1013, in _start_runners
runner.start_runner(sess, summary_writer, **kwargs) # starting runner threads
File "/home/jack/btgym/btgym/algorithms/runner/synchro.py", line 173, in start_runner
self.start(sess, summary_writer, **kwargs)
File "/home/jack/btgym/btgym/algorithms/runner/synchro.py", line 195, in start
_ = self.get_data()
File "/home/jack/btgym/btgym/algorithms/runner/synchro.py", line 556, in get_data
off_policy=self.memory.sample_uniform(sequence_size=rollout_length),
File "/home/jack/btgym/btgym/algorithms/memory.py", line 155, in sample_uniform
if self._frames[start_pos]['terminal']:
IndexError: deque index out of range
[2019-02-05 08:00:36.478017] ERROR: Worker_0: deque index out of range
...
The index is set a line before as a random number by this:
start_pos = np.random.randint(0, self._history_size - sequence_size - 1)
so just to test it I forced start_pos=0
this resulted in a fallback to previously discussed scenario (sometimes works and sometime errors).
Do I understand correctly any of aux. task is enabled (e.g. value_replay) here? if yes, try running it with all aux tasks turned off. This time it seems replay memory error.
This time it seems replay memory error
I ran it without replay and you are right this is a different issue.
@Kismuz if you want I can clear all the unrelevant stuff from my code and post a simple example to reproduce the problem
post a simple example to reproduce the problem
ok
In the process of trying to simplify the code I learn a few things regarding this issue.
set_datalines()
but never reach get_external_state()
while error 3 happens after get_external_state()
.** the code below is a simple version of the issue and it doesn't reproduce error 1 (I'm still trying to create a simple code to show error 1)
*** I used MonoSpreadOUStrategy_0
which had been deleted from source just cause it was the easiest gen 6 without posting a long code
import warnings
warnings.filterwarnings("ignore") # suppress h5py deprecation warning
import os
import backtrader as bt
from backtrader.resamplerfilter import Resampler
import backtrader.indicators as btind
import numpy as np
from btgym import BTgymEnv, BTgymDataset
from btgym.strategy.observers import Reward, Position, NormPnL
from btgym.algorithms import Launcher, Unreal, AacStackedRL2Policy, PPO
from btgym.research.strategy_gen_4 import DevStrat_4_12
from btgym.research.model_based.strategy import MonoSpreadOUStrategy_0
from btgym.research.casual_conv.networks import conv_1d_casual_encoder, conv_1d_casual_attention_encoder
from btgym import MultiDiscreteEnv
from btgym.datafeed.multi import BTgymMultiData
from btgym.algorithms.nn.layers import linear
class StrategyTest_v4(DevStrat_4_12):
def set_datalines(self):
print('StrategyTest v4')
self.dnames['resample_1'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.dnames['resample_2'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.dnames['resample_3'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.dnames['resample_4'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.dnames['resample_5'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.data.features = [
btind.SimpleMovingAverage(self.dnames['base'], period=period) for period in self.features_parameters
]
self.data.dim_sma = btind.SimpleMovingAverage(
self.dnames['base'],
period=(np.asarray(self.features_parameters).max() + self.time_dim * 10)
)
self.data.dim_sma.plotinfo.plot = False
class StrategyTest_v6(MonoSpreadOUStrategy_0):
def set_datalines(self):
print('StrategyTest v6')
self.dnames['resample_1'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.dnames['resample_2'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.dnames['resample_3'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.dnames['resample_4'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.dnames['resample_5'].addfilter(Resampler, timeframe=bt.TimeFrame.Minutes, compression=5)
self.data.high = self.data.low = self.data.close = self.data.open
self.data.std = btind.StdDev(self.data.open, period=self.p.time_dim, safepow=True)
self.data.std.plotinfo.plot = False
self.data.features = [
btind.SimpleMovingAverage(self.data.open, period=period) for period in self.p.features_parameters
]
initial_time_period = np.asarray(self.p.features_parameters).max() + self.p.time_dim * 10
self.data.dim_sma = btind.SimpleMovingAverage(
self.dnames['base'],
period=initial_time_period
)
self.data.dim_sma.plotinfo.plot = False
# Set backtesting engine parameters:
MyCerebro = bt.Cerebro()
# Define strategy and broker account parameters:
MyCerebro.addstrategy(
StrategyTest_v6,
start_cash=2000, # initial broker cash
commission=0.0001, # commisssion to imitate spread
leverage=10.0,
# order_size=2000, # fixed stake, mind leverage
drawdown_call=10, # max % to loose, in percent of initial cash
target_call=10, # max % to win, same
skip_frame=10,
gamma=0.99,
reward_scale=7, # gardient`s nitrox, touch with care!
cash_name='USD', # just a naming for cash asset
asset_names={'base', }, # that means we use JPY and GBP as information data lines only
order_size={
'base': 2000,
# 'GBP': 1000,
# 'JPY': 1000,
# 'USD': 1000,
},
)
data_config = {
'base': {'filename': '/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'},
'resample_1': {'filename': 'test_sine_1min_period256_delta0002.csv'},
'resample_2': {'filename': 'test_sine_1min_period256_delta0002.csv'},
'resample_3': {'filename': 'test_sine_1min_period256_delta0002.csv'},
'resample_4': {'filename': 'test_sine_1min_period256_delta0002.csv'},
'resample_5': {'filename': 'test_sine_1min_period256_delta0002.csv'},
}
# Visualisations for reward, position and PnL dynamics:
MyCerebro.addobserver(Reward)
MyCerebro.addobserver(Position)
MyCerebro.addobserver(NormPnL)
MyDataset = BTgymMultiData(
data_class_ref=BTgymDataset,
data_config=data_config,
start_weekdays={0, 1, 2, 3, 4, 5, 6},
episode_duration={'days': 5, 'hours': 23, 'minutes': 40}, # note: 2day-long episode
start_00=False,
time_gap={'hours': 10},
)
env_config = dict(
class_ref=MultiDiscreteEnv,
kwargs=dict(
dataset=MyDataset,
engine=MyCerebro,
render_modes=['episode', ], # 'human', 'internal', 'external'],
render_state_as_image=True,
render_ylabel='OHL_diff. / Internals',
render_size_episode=(12, 8),
render_size_human=(9, 4),
render_size_state=(11, 3),
render_dpi=75,
port=5000,
data_port=4999,
connect_timeout=9000,
verbose=0,
)
)
cluster_config = dict(
host='127.0.0.1',
port=12230,
num_workers=1, # set according CPU's available or so
num_ps=1,
num_envs=1,
log_dir=os.path.expanduser('~/tmp/test'), # current checkpoints and summaries are here
initial_ckpt_dir=os.path.expanduser('~/tmp/pre_trained'), # load pre-trained model, if chekpoint found
)
policy_config = dict(
class_ref=AacStackedRL2Policy,
kwargs={
'lstm_layers': (256, 256),
'lstm_2_init_period': 60,
'dropout_keep_prob': 0.25, # 0.25 - 0.75 opt
'state_encoder_class_ref': conv_1d_casual_encoder,
'linear_layer_ref': linear,
}
)
trainer_config = dict(
class_ref=PPO,
kwargs=dict(
opt_learn_rate=[1e-4, 1e-4], # random log-uniform
opt_end_learn_rate=1e-5,
opt_decay_steps=50 * 10 ** 6,
model_gamma=0.99,
model_gae_lambda=1.0,
model_beta=0.05, # entropy reg
rollout_length=20,
time_flat=True,
use_value_replay=False,
model_summary_freq=10,
episode_summary_freq=1,
env_render_freq=2,
)
)
launcher = Launcher(
cluster_config=cluster_config,
env_config=env_config,
trainer_config=trainer_config,
policy_config=policy_config,
test_mode=False,
max_env_steps=100*10**6,
save_secs=30,#300, # save checkpoint every N seconds (default is 600)
root_random_seed=0,
purge_previous=0, # ask to override previously saved model and logs
verbose=0
)
# Train it:
launcher.run()
@Kismuz I think I finally found the cause of the errors.
So much time for such a small and insignificance thing, what is causing the error is for some strange reason the use of self.dnames['xxx']
to address the data source. Thats it, just that! nothing to do with filters!
I have tested it with self.datas[x]
instead and didn't got any error.
I was using this dnames
backtrader notation cause data_config
is a dictionary so the order of the assets are not the same as the order of datas
.
my intention of using dnames
was to call a specific data asset by name.
I guessing that the error is caused by some sort of race between the server thread and the worker thread. where the worker try to access dnames
dictionary before it get initialize in backtrader side.
I will do more testing to confirm this.
** @Kismuz I just checked the code for places that use self.datas[x] (where x is not 0).
and it seems that you are using it in model_based
. It can be a good idea to check that it doesn't affect the algo (using datas[0] where you intened to use datas[1])
@JacobHanouna,
use of
self.dnames['xxx']
to address the data source. Thats it, just that! nothing to do with filters!
to check that it doesn't affect the algo (using datas[0] where you intened to use datas[1])
.getdatabyname()
method to call data by it name string but I have never employed it.@Kismuz this issue, unfortunately, is still unresolved. But some progress was made.
I have changed self.dnames['xxx']
to self.getdatabyname('xxx')
but still got the same errors.
so to avoid using names to reference to the data, I just used a native loop as follows:
def set_datalines(self):
base_data = None
for data in self.datas:
print(data._name)
if(data._name == 'base'):
base_data = data
# define the max period asked by indicators
# note: just a technical trick to sync all indicators starting point
initial_time_period = np.asarray(self.p.features_parameters).max() + self.p.time_dim
self.data.dim_sma = btind.SimpleMovingAverage(
base_data,
#self.dnames['base'],
#self.datas[0],
period=initial_time_period
)
using this small modification to the example code I posted before I got:
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Restoring parameters from /home/jack/tmp/test_4_11_multi10/current_train_checkpoint/model_parameters-658
[2019-02-11 13:49:03.411479] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
resample_4
base
resample_2
resample_3
resample_5
resample_1
[2019-02-11 13:49:18.841183] NOTICE: synchro_Runner_0: started collecting data.
[2019-02-11 13:49:18.960796] NOTICE: Worker_0: started training at step: 658
It shows that the order of datas[]
are random. each iteration the order will be different but error only happens when 'base' location in datas[x]
is of x>1. so the above run log didn't had any errors but later locations of 'base' produce the errors
Tried two solutions I had in mind, both unsuccessful:
according to the code in https://github.com/Kismuz/btgym/blob/0258a2baa728bbe4aa4ef039dca781da1e309b2c/btgym/datafeed/multi.py#L121-L122
I have changed data_config
to force 'base' to be the master data
data_config = {
'base': {'filename': '/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv',
'base': True},
'resample_1': {'filename': '/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'},
'resample_2': {'filename': '/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'},
'resample_3': {'filename': '/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'},
'resample_4': {'filename': '/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'},
'resample_5': {'filename': '/home/jack/btgym/examples/data/test_sine_1min_period256_delta0002.csv'},
}
this did force 'base' to be the master data but didn't change the errors behavior.
tried to force data_config
to be an ordered dictionary using collections.OrderedDict()
but It get mix back somewhere down the code
@Kismuz any thoughts?
@JacobHanouna ,
It shows that the order of
datas[]
are random
...that seems ridiculous. Datafeeds added one by one to one of strategy internal lists and are not moved in any way. Can it be that resamplers manipulate that?
Can it be that resamplers manipulate that?
use of
self.dnames['xxx']
to address the data source. Thats it, just that! nothing to do with filters!
I should had posted the whole function on the previous post. but once I got that the problem doesn't come from the filters at all, I removed them from the code. The only change I needed to do to show the error, was to use self.dnames['base']
as the data of SMA for self.data.dim_sma
def set_datalines(self):
self.data.high = self.data.low = self.data.close = self.data.open
self.data.std = btind.StdDev(self.data.open, period=self.p.time_dim, safepow=True)
self.data.std.plotinfo.plot = False
self.data.features = [
btind.SimpleMovingAverage(self.data.open, period=period) for period in self.p.features_parameters
]
base_data = None
for data in self.datas:
print(data._name)
if(data._name == 'base'):
base_data = data
initial_time_period = np.asarray(self.p.features_parameters).max() + self.p.time_dim
self.data.dim_sma = btind.SimpleMovingAverage(
base_data,
#self.dnames['base'],
#self.datas[0],
period=initial_time_period
)
self.data.dim_sma.plotinfo.plot = False
...that seems ridiculous. Datafeeds added one by one to one of strategy internal lists and are not moved in any way
Dictionaries in python are unordered. if I print data_config
right after I declare it won't be in same order as I initialized it. So when they are added as datafeeds they inherently unordered.
tried to force
data_config
to be an ordered dictionary usingcollections.OrderedDict()
but It get mix back somewhere down the code
This one I have no clue why it happens. when printed the 'order dict' after declaration it show it right. so if nothing is affecting it after it is strange (I'll retest just to verify again)
2. What kind of behaviour in general you aim to achieve by adding resamples?
I'm experimenting with very long time_dim which is resource heavy. so I'm looking for ways to reduce dimensionality without losing resolution of the data. I'm doing so by researching the relation between different resamples version of the data. At first I did the resampling operation directly on the computational graph but it was very slow as it needed to do all this operations per step. So to do some good experiments I use backtrader resampler because it create the resampled data on the fly (keep adding data as iterations continues) which makes it very light wight to work with.
...that seems ridiculous. Datafeeds added one by one to one of strategy internal lists and are not moved in any way
Dictionaries in python are unordered. if I print
data_config
right after I declare it won't be in same order as I initialized it. So when they are added as datafeeds they inherently unordered.
You are indeed right. I should consider it as a sort of subtile bug. MultiDataset relies on asset names as dict keys; indexing it later via position in a datas list is unreliable.
Main purpose of master asset is to be reference index for all others data streams.
looking for ways to reduce dimensionality without losing resolution of the data.
I would reformulate that data resampling enables capture long periods sacrificing resolution. The problem is that it sacrifices it in a not very intelligent way: it does not prioritise recent data in expense of the tail by dividing entire embedding period evenly. I would suggest another resampling scheme with resampling periods increasing form present to past. It allows rendering recent observations precisely while steel propagating information from the tail. It is like SMA vs EMA. Or, one can say that casual convolution is doing nearly the same (in this case resampling period grows backward as power of 2). Implementing it without learnable kernels via simple lin.alg. can indeed save computation time.
...resampling periods increasing form present to past
This is the main idea of what i'm working on
simple lin.alg. can indeed save computation time.
and this is the motivation
After testing different options in Backtrader, I concluded that, working with filters needs a datafeed per filter. so working with MultiDataset is the trade off for working on single asset with multiple versions of data
I will try to keep investigating it as this is the only way forward for me to experiment. will keep on posting findings if I find any new ones
tried to force
data_config
to be an ordered dictionary usingcollections.OrderedDict()
but It get mix back somewhere down the codeThis one I have no clue why it happens. when printed the 'order dict' after declaration it show it right. so if nothing is affecting it after it is strange (I'll retest just to verify again)
datas[]
is created when we use cerebro.adddata()
in sever.py but before we add the data, 2 dictionary are created on BTgymMultiData
which mix the oredered data_config
.
to fix the problem for keeping the original order, change dictionary declaration from {}
to OrderedDict()
in the following locations:
https://github.com/Kismuz/btgym/blob/master/btgym/datafeed/multi.py#L95
https://github.com/Kismuz/btgym/blob/master/btgym/datafeed/multi.py#L241
If I force the error again I can see that the order still get mixed up, so other dict mix are happening in other regions of the code
AssertionError: Unexpected environment response: No <ctrl> key received:Control mode: received <{'action': {'base': 'hold', 'resample_3': 'hold', 'resample_4': 'hold', 'resample_2': 'hold', 'resample_5': 'hold', 'resample_1': 'hold'}}>
until now my setup had involved working with data_class_ref=BTgymDataset
, after I found some workaround to run code without causing the error I tried to change to BTgymRandomDataDomain
:
MyDataset = BTgymMultiData(
data_class_ref=BTgymRandomDataDomain,
data_config=data_config,
target_period={'days': 50, 'hours': 0, 'minutes': 0}, # use last 50 days of one year data as 'target domain'
# so we get [360 - holidays gaps - 50]
# days of train data (exclude holidays)
trial_params=dict(
start_weekdays={0, 1, 2, 3, 4, 5, 6},
sample_duration={'days': 30, 'hours': 0, 'minutes': 0}, # let each trial be 30 days long
start_00=True, # ajust trial beginning to the beginning of the day
time_gap={'days': 0, 'hours': 23}, # tolerance param
test_period={'days': 6, 'hours': 0, 'minutes': 0}, # from those 30 reserve last 6 days for trial test data
),
episode_params=dict(
start_weekdays={0, 1, 2, 3, 4, 5, 6},
sample_duration={'days': 1, 'hours': 23, 'minutes': 55}, # make every episode duration be 23:55
start_00=False, # do not ajust beginning time
time_gap={'days': 0, 'hours': 10},
),
)
now I get the following error:
[2019-02-21 10:09:51.480183] NOTICE: Worker_0: initializing all parameters...
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[2019-02-21 10:09:52.977672] NOTICE: Worker_0: no saved model parameters found in:
/home/jack/tmp/pre_trained_model4
[2019-02-21 10:09:53.098436] NOTICE: Worker_0: no saved model parameters found in:
/home/jack/tmp/base test/current_train_checkpoint
[2019-02-21 10:09:53.098537] NOTICE: Worker_0: training from scratch...
[2019-02-21 10:09:55.735562] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-01-01 02:00:00 / stamp: 1483228800.0
Process BTgymDataFeedServer-2:1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/jack/btgym/btgym/dataserver.py", line 176, in run
sample = self.get_data(sample_config=service_input['kwargs'])
File "/home/jack/btgym/btgym/dataserver.py", line 88, in get_data
sample = self.dataset.sample(**sample_config)
File "/home/jack/btgym/btgym/datafeed/multi.py", line 219, in sample
master_sample = self.master_data.sample(**kwargs)
File "/home/jack/btgym/btgym/datafeed/base.py", line 539, in sample
return self._sample(**kwargs)
File "/home/jack/btgym/btgym/datafeed/base.py", line 617, in _sample
**kwargs
File "/home/jack/btgym/btgym/datafeed/base.py", line 900, in _sample_interval
sample_len = (sampled_data.index[-1] - sampled_data.index[0]).to_pytimedelta()
File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/datetimelike.py", line 403, in __getitem__
val = getitem(key)
IndexError: index -1 is out of bounds for axis 0 with size 0
@Kismuz It happens at the beginning of the run (this error happen on every run). you think it is related? or this is something else?
** data is a simple sin, same as before
@JacobHanouna, seems we found that nasty 'hanging done' bug - thanks to @deandreee and #103 I have introduced it when fixing #84, now hope it is fixed.
@Kismuz I retested the scenario after update but errors remains the same
Closed due to long inactivity period.
UPDATE: bug was found to be related to some problem with
self.datas
inBTgymMultiData
. reason is still unclear. but issue is unrelated to filters as originally proposedI'm currently experimenting with data preprocessing on a custom strategy, mainly overriding
set_datalines()
andget_external_state()
. my setup use multiple data sources (BTgymMultiData
) and the use of backtrader filters.filters are applied to a datafeed and dynamically change the datafeed. I'm adding those filters in
set_datalines()
so all of the data manipulation happens once in the beginning of the episode.The concept seems to work fine but sometimes will result in one of those two errors:
If the error occur, it only happens at the beginning of each run, so if it reach a save checkpoint it won't have an error (for that run). errors doesn't happen all the times but adding more datafeeds that have a filter attached to them seems to increase the frequency the error will occur.
took me some time to isolate this non deterministic bug from the rest of my code. to recreate this bug one need a BTgymMultiData setup and to add a filter to the datafeed, easiest way I came up with is use
multi_discrete_setup_intro
example and changeCasualConvStrategyMulti
as follow:notes:
dim_sma
to be 10 times longer (to fit the resampler)@Kismuz, been trying to play around with this issue for the last two weeks, and had encountered more errors along the way but not sure all are relevant so for now I think it is enough info.