-
-
-
Missing details
- "all weights w_i were scaled so that max_i w_i = 1". Is max_i w_i computed over a minibatch or the whole buffer?
- What is the value of epsilon that is added to absolute TD errors?
-
-
1. [Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards](https://arxiv.org/pdf/1707.08817.pdf)
-
The rainbow agent by default experienced the best base result in sonic for the OpenAI team by a large margin, if you exclude the ridiculously resource intensive parallel PPO training:
https://arxiv…
-
-
Hi,
I'm a newbie to Deep RL and tensorforce and I'm trying to understand all the aspects of the algorithms.
I'm using the PPO agent right now but I have some doubts regarding the Update Method a…
-
My next step is to have clean working and benchmarked policy gradient reinforcement learning algorithms.
-
If I wanted to train a feedforward network agent with ~100 inputs, 8 outputs, and a hidden layer of 512 or so, can I use a DQN from this library to do it and expect it to work out okay?
Does the DQ…