use python *.py to start A3C trainning, i get CreateSession still waiting for response from worker warning, and the master node never response.

vincetom1980 commented 6 years ago

hi Andrew,

I got a new issue when try to run A3C algorithm in example a3c_random_on_synth_or_real_data_4_6.ipynb, I run the file in a terminal with cmd: python xxx.py, but I got the following error,have you encountered this problem before?

INFO:Env:Environment is ready. WARNING:worker_8:AAC_8: learn_rate: 0.000100, entropy_beta: 0.038476 INFO:Env:Server started, pinging tcp://127.0.0.1:5009 ... DEBUG:Env:Server Control mode: received <{'ctrl': 'ping!'}> DEBUG:Env:Server sent: {'ctrl': 'send control keys: <_reset>, <_getstat>, <_render>, <_stop>.'} DEBUG:Env:Server seems ready with response: <{'ctrl': 'send control keys: <_reset>, <_getstat>, <_render>, <_stop>.'}> INFO:Env:Environment is ready. WARNING:worker_9:AAC_9: learn_rate: 0.000100, entropy_beta: 0.010318 2017-12-29 02:14:33.860902: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 2017-12-29 02:14:34.660841: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 2017-12-29 02:14:39.352340: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 2017-12-29 02:14:39.357352: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 2017-12-29 02:14:39.362327: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 2017-12-29 02:14:39.363592: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 2017-12-29 02:14:39.368915: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 2017-12-29 02:14:39.372948: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 2017-12-29 02:14:39.377060: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 2017-12-29 02:14:39.388348: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0

Kismuz commented 6 years ago

@vincetom1980, no, I haven't seen this before. Seems to be distributed TF error. Have you tried to run it as notebook under jupyter kernel?

vincetom1980 commented 6 years ago

yes，when I use jupyter kernel the error disapeared. But it's really difficult for me to debug with jupyter for I don't know how to trace into the backend code.

Now I try to replace the input file DAT_ASCII_EURUSD_M1_201703.csv with the Future contract file, whose content like: 20171130 105900;2991.0;2993.0;2991.0;2991.0;776.0 20171130 110000;2992.0;2997.0;2995.0;2991.0;2004.0 20171130 110100;2995.0;2995.0;2993.0;2992.0;666.0 20171130 110200;2993.0;2996.0;2996.0;2993.0;644.0 20171130 110300;2996.0;2996.0;2995.0;2994.0;824.0 20171130 110400;2995.0;2995.0;2995.0;2994.0;324.0 20171130 110500;2994.0;2998.0;2997.0;2994.0;1546.0 20171130 110600;2998.0;3000.0;2999.0;2997.0;2078.0 20171130 110700;3000.0;3000.0;2996.0;2995.0;1166.0 20171130 110800;2997.0;2997.0;2994.0;2993.0;1260.0 20171130 110900;2993.0;2996.0;2995.0;2993.0;768.0 20171130 111000;2995.0;2996.0;2995.0;2994.0;606.0 20171130 111100;2994.0;2995.0;2992.0;2991.0;1948.0 20171130 111200;2992.0;2992.0;2988.0;2988.0;3150.0 20171130 111300;2989.0;2990.0;2989.0;2987.0;1932.0 20171130 111400;2990.0;2992.0;2992.0;2989.0;1060.0 20171130 111500;2992.0;2993.0;2993.0;2991.0;626.0 20171130 111600;2993.0;2994.0;2992.0;2992.0;720.0 20171130 111700;2992.0;2993.0;2989.0;2988.0;1266.0 20171130 111800;2989.0;2990.0;2987.0;2985.0;1718.0 20171130 111900;2987.0;2991.0;2990.0;2986.0;2018.0 20171130 112000;2991.0;2993.0;2991.0;2990.0;1768.0 20171130 112100;2991.0;2991.0;2984.0;2983.0;3684.0 20171130 112200;2984.0;2989.0;2987.0;2983.0;5100.0 20171130 112300;2988.0;2988.0;2986.0;2984.0;2464.0

the training episode is ok, but I always get 0 reward, do you know why? I have to stop here for nearly 3 days, than you for your help.

Kismuz commented 6 years ago

@vincetom1980,

the training episode is ok, but I always get 0 reward, do you know why?

no, but you definitely have to change broker account settings to match your trading instrument, cause those in example were set to match exact currency pair:

# Set leveraged account:
MyCerebro.broker.setcash(2000)
MyCerebro.broker.setcommission(commission=0.0001, leverage=10.0) # commisssion to imitate spread
MyCerebro.addsizer(bt.sizers.SizerFix, stake=5000,)

Refer backtrader documentation for details. Also, amplifier constant inside strategy class is instrument sensitive and may need tuning:

...
...
   def get_market_state(self):
        T = 2e3  # EURUSD
        # T = 1e2 # EURUSD, Z-norm
        # T = 1 # BTCUSD
...
...

make and attach here screenshot of images tab in tensorboard, including episode rendering and state input rendering. It can give some hints on what's going wrong.

vincetom1980 commented 6 years ago

：）, the problem is stake=5000,when i change this parameter it works.

Kismuz / btgym

use python *.py to start A3C trainning, i get CreateSession still waiting for response from worker warning, and the master node never response. #27