flrngel / understanding-ai

personal repository
36 stars 6 forks source link

Asynchronous Methods for Deep Reinforcement Learning #14

Open flrngel opened 6 years ago

flrngel commented 6 years ago

https://arxiv.org/abs/1602.01783 aka A3C by Google

This paper introduces Asynchronous 1-step Q-Learning, n-step Q-Learning, Sarsa, A3C A3C is best

image (image originally from openresearch.ai)

A3C is on-policy method (compare to Q-Learning is off-policy) image

Loss = Policy Loss + 0.5 * Value Loss image image

\pi (x) has (typically) one softmax output for the policy with convolution network

one linear output for value function V with non-output layers shared