andrew-j-levy / Hierarchical-Actor-Critc-HAC-

This repository contains the code to implement the Hierarchical Actor-Critic (HAC) algorithm.
MIT License
253 stars 61 forks source link

Taking mean gradient instead of gradient of single sample #1

Closed krishpop closed 5 years ago

krishpop commented 5 years ago

https://github.com/andrew-j-levy/Hierarchical-Actor-Critc-HAC-/blob/3a2991b303c8f59c6176dfa9b2425f21dbfb1f5e/critic.py#L131

Is there a reason for only using the gradient of a single sample, or would it be better to take the mean?

I should add, thank you for sharing your code online, I found this project to be quite interesting! I have also been working on a pytorch implementation that I should have ready soon.

andrew-j-levy commented 5 years ago

Hi Krishnan,

Thanks for being interested in the project! Let me know when your implementation in pytorch is ready and I will post a link to it.

So the actor and critic networks from all levels are updated using the mean gradient generated from a batch of transitions. The mean gradients for both the actor and network are generated as follows. The learn method in the layer.py file first samples a batch of transitions. These are then passed to the critic and actor update methods, which updates its respective neural network using the mean gradient. Line 58 in critic.py and line 71 in actor.py, in particular, enable the critic and actor to use the mean gradient.

krishpop commented 5 years ago

I see... but what is the reason for returning grads[0] in line 131 from critic.py, since that is ultimately what is fed into action_derivs in line 71 of actor.py. Is that done just because tensorflow returns a list of lists of derivatives, and you need to unwrap it?

andrew-j-levy commented 5 years ago

Yes, I am pretty confident that is correct.