Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
985 stars 260 forks source link

No training started when i using agent from rllib (and there's no error info) #114

Closed Ray-0403 closed 4 years ago

Ray-0403 commented 5 years ago

Hey, the action space is a dict (ActionDictSpace) but i need it to be discrete (gym.space.discrete) to use rllib,so i made a wrapper to change it.

Ray-0403 commented 5 years ago

After i made the wrapper, i start training using PPO agent from rllib, but there's no any output information of training and it just ended with no error after few seconds, i don't know what's wrong. Here's the output:

runfile('/Users/bluecharles/Desktop/wrapper_btgym.py', wdir='/Users/bluecharles/Desktop') WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/compat.py:175: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term 2019-07-06 23:06:20,008 INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-07-06_23-06-20_007893_32255/logs. 2019-07-06 23:06:20,132 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:22594 to respond... 2019-07-06 23:06:20,919 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:22243 to respond... 2019-07-06 23:06:20,975 INFO services.py:806 -- Starting Redis shard with 1.72 GB max memory. 2019-07-06 23:06:21,076 INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-07-06_23-06-20_007893_32255/logs. 2019-07-06 23:06:21,084 INFO services.py:1446 -- Starting the Plasma object store with 2.58 GB memory using /tmp. BTgymDataset class is DEPRECATED, use btgym.datafeed.derivative.BTgymDataset2 instead. 2019-07-06 23:06:48,118 INFO rollout_worker.py:301 -- Creating policy evaluation worker 0 on CPU (please ignore any CUDA init errors) 2019-07-06 23:06:48.132867: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/ray/rllib/models/fcnet.py:37: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/ray/rllib/models/action_dist.py:123: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.random.categorical instead. 2019-07-06 23:06:53,619 INFO dynamic_tf_policy.py:313 -- Initializing loss function with dummy input:

{ 'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>, 'actions': <tf.Tensor 'default_policy/actions:0' shape=(?,) dtype=int64>, 'advantages': <tf.Tensor 'default_policy/advantages:0' shape=(?,) dtype=float32>, 'behaviour_logits': <tf.Tensor 'default_policy/behaviour_logits:0' shape=(?, 4) dtype=float32>, 'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=bool>, 'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 276) dtype=float32>, 'obs': <tf.Tensor 'default_policy/observation:0' shape=(?, 276) dtype=float32>, 'prev_actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>, 'prev_rewards': <tf.Tensor 'default_policy/prev_reward:0' shape=(?,) dtype=float32>, 'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>, 'value_targets': <tf.Tensor 'default_policy/value_targets:0' shape=(?,) dtype=float32>, 'vf_preds': <tf.Tensor 'default_policy/vf_preds:0' shape=(?,) dtype=float32>}

WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. /anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. 2019-07-06 23:07:12,361 INFO rollout_worker.py:719 -- Built policy map: {'default_policy': <ray.rllib.policy.tf_policy_template.PPOTFPolicy object at 0x1c4fdef320>} 2019-07-06 23:07:12,365 INFO rollout_worker.py:720 -- Built preprocessor map: {'default_policy': <ray.rllib.models.preprocessors.DictFlatteningPreprocessor object at 0x1c4fdcdda0>} 2019-07-06 23:07:12,367 INFO rollout_worker.py:333 -- Built filter map: {'default_policy': <ray.rllib.utils.filter.NoFilter object at 0x1c4fdcdbe0>} 2019-07-06 23:07:13,518 INFO multi_gpu_optimizer.py:79 -- LocalMultiGPUOptimizer devices ['/cpu:0'] (pid=32291) WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/compat.py:175: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. (pid=32291) Instructions for updating: (pid=32291) non-resource variables are not supported in the long term (pid=32293) WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/compat.py:175: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. (pid=32293) Instructions for updating: (pid=32293) non-resource variables are not supported in the long term (pid=32294) WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/compat.py:175: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. (pid=32294) Instructions for updating: (pid=32294) non-resource variables are not supported in the long term (pid=32291) BTgymDataset class is DEPRECATED, use btgym.datafeed.derivative.BTgymDataset2 instead. (pid=32294) BTgymDataset class is DEPRECATED, use btgym.datafeed.derivative.BTgymDataset2 instead. (pid=32293) BTgymDataset class is DEPRECATED, use btgym.datafeed.derivative.BTgymDataset2 instead.

And i'm sure the wrapper works fine. observation space: Dict(external:Box(30, 1, 5), internal:Box(20, 1, 6), metadata:Dict(first_row:Box(), sample_num:Box(), timestamp:Box(), trial_num:Box(), trial_type:Box(), type:Box())) action space: Discrete(4) action: 2

Can anyone help me? Thank you very much!