Open andrewjong opened 3 years ago
Vanilla Policy Gradients are unstable. TRPO is complicated. Make something stable and simple
We covered this paper in this video
0. Article Information and Links
1. What do the authors try to accomplish?
Vanilla Policy Gradients are unstable. TRPO is complicated. Make something stable and simple
2. What's great compared to previous research?
3. Where are the key elements of the technology and method?
4. How do the authors measure success?
5. How did you verify that it works?
6. Things to discuss? (e.g. weaknesses, potential for future work, relation to other work)
7. Are there any papers to read next?
8. References