-
From 311 self-play games I generated so far, I counted 157 draws, 139 white wins, and 15 black wins. This strikes me as odd, considering that the first move advantage should be worth very little for a…
-
I have been trying to implement the [A3C algorithm](https://arxiv.org/abs/1602.01783). However, I have found that it is impossible to create an ndarray that is shared among different processes.
Howev…
-
It doesn't seem like custom gradients like this can be built automatically and is a nice idea. Could the log likelihood trick be implemented?
http://arxiv.org/pdf/1506.05254v3.pdf
http://blog.shakirm…
-
Hi,
Unless the goal is not to support tensorflow with gpu, I would recommend to move the tensorflow requirement to "extra_requires". I have seen this pattern in both sonnet and tensor2tensor.
…
-
Peeking queues would be a useful addition for multi threaded TensorFlow applications. Right now there is no way to look at the first element without affecting the queue. We could empty a `tf.FIFOQueue…
-
Apparently current LZ doesn't have any good idea about counting, and that has been the reason of disabling resignation for self-play games. A few questions regarding this:
1. When will be the point…
-
@fchollet Since we're not adding much to the repo at this stage (in terms of layers, loss functions, callbacks, etc.), we've talked quite a bit about an external repo for user/additional contributions…
-
Have you trained Breakout with your a3c by any chance? I wonder that kind of scores you have gotten.
John
-
http://arxiv.org/pdf/1602.01783v1.pdf describes asynchronous methods using off policy (1 step /n step Q learning) and even on policy (sarsa and advantage actor-critic (A3C)) reinforcement learning.
T…
-
Hi Ran,
thanks again for creating qtpylib, it is great.
I have a strange behaviour: if I create a fresh Anaconda environment, install qtpylib there (with `pip install qtpylib --upgrade --no-cache-…