MrSyee / pg-is-all-you-need

Policy Gradient is all you need! A step-by-step tutorial for well-known PG methods.
MIT License
848 stars 119 forks source link

Add DDPG #2

Closed MrSyee closed 5 years ago

Curt-Park commented 5 years ago

Well done. Here are some minor comments mainly about the sentences.

  1. The authorconstructs an exploration => The authors construct an exploration

  2. sampled from a noise process N to our actor policy => sampled from a noise process N to the actor policy

  3. The authors used an Ornstein-Uhlenbeck process => The authors used Ornstein-Uhlenbeck process

  4. We used an Ornstein-Uhlenbeck process like the explanation in paper to generate temporally correlated exploration for exploration efficiency in physical control problems with inertia. => Ornstein-Uhlenbeck process generates temporally correlated exploration, and it effectively copes with physical control problems of inertia.

  5. We are going to use two networks for actor and critic. => We are going to use two separated networks for actor and critic.

  6. tanh for output layer => tanh for the output layer

  7. Actor network => the actor network

  8. Like actor, Critic network has three fully connected layers. In a different from actor, it used two non-linearity functions that only ReLU. Also input sizes of critic network are sum of state sizes and action sizes. => On the one hand, the critic network has three fully connected layers, but it uses two activation functions for hidden layers (ReLU). Plus, its input size is the sum of the sizes of the state dimension and the action dimension.

  9. The final layer weights and biases of both actor and critic are initialized from uniform distribution. -> One thing to note is that we initialize the final layer's weights so that they are uniformly distributed.

  10. We implement the action wrapper called ActionNormalizer. It makes range of continuous action which change varience in depends on environment to be (-1, 1). Then, our agent selects only (-1, 1) range action. => ActionNormalizer is an action wrapper class to normalize the action values ranged in (-1, 1). Thanks to this class, we can make the agent simply select action values within the zero centered range (-1, 1).

Curt-Park commented 5 years ago

It doesn't show the rendered gif without executing the notebook cells.

MrSyee commented 5 years ago

It doesn't show the rendered gif without executing the notebook cells.

I'll add NBviewer link on READMD.md. It works well.

mclearning2 commented 5 years ago

step function in Agent has improper comment # if the last state is not a terminal state, store done as false

mclearning2 commented 5 years ago

In __init__() of Agent, a parameter is never used self.weight_decay = 1e-6

mclearning2 commented 5 years ago

I think action is normalized to (low, high). So this comment is changed from (-1, 1) to (low, high)

Environment

ActionNormalizer is an action wrapper class to normalize the action values ranged in (-1. 1). Thanks to this class, we can make the agent simply select action values within the zero centered range (-1, 1)