Zeta36 / Asynchronous-Methods-for-Deep-Reinforcement-Learning

Using a paper from Google DeepMind I've developed a new version of the DQN using threads exploration instead of memory replay as explain in here: http://arxiv.org/pdf/1602.01783v1.pdf I used the one-step-Q-learning pseudocode, and now we can train the Pong game in less than 20 hours and without any GPU or network distribution.
83 stars 31 forks source link

Implement the actor-critic methods #1

Open originholic opened 8 years ago

originholic commented 8 years ago

Hello, In the asynchronous dqn paper, they also described an on policy method, the advantage actor-critic (A3C), which achieved better results than others, do you currently have any plan to include this method in this repo as well? Because I am working off this repo as a starting point, and attempt to reproduce the results of the A3C method on the continuous action domain, but I am still trying to figure out the network model they used in the physical state case when apply to Mojoco, and how the policy gradient is accumulated.

Zeta36 commented 8 years ago

No, originholic. I'm working in others things right now :(.

Maybe in the futurre I try with the advantage actor-critic, but not now. I'm sorry.

Regards. Samu.