-
I wonder what Tensorborad's "value loss"is meaning.
in this tutorial
https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md#observing-training-progre…
-
In the code given in this page, it seems like the agents are unrolled 100 steps after which the (possibly partial) trajectory is sent to the learner. If the trajectory is not finished at that point, t…
-
Hi
How can I get a deterministic action out of PPO policy? I need to turn off the exploration noise for that; but there doesn't seem to be such a switch in the code.
-
Dear author,
In your implementation of soft actor critic, there is no value function V(s)?
In the original paper of SAC, the authors said such value function can stabilize training and is c…
-
If tf.net can be connected to this, it should be a lot easier. py often encounters some incompatibility problems, it is not easy to debug.
Unity Machine Learning Agents Toolkit
https://github.com/…
-
Hi,
To my knowledge, I think hopper-v1 is deprecated and Hopper-v2 is the standard hopper as of today. Can someone validate if this is true ?
In most of the RL papers, I see results where the au…
-
-
Deep Deterministic Policy Gradients ([DDPG][1]) and stable Baseline Code is presented [here][2].
The actor-critic networks are created as follows:
normalized_obs = tf.clip_by_value(normali…
-
I'd like to implement Hindsight Experience Replay (HER). This can be based on a whatever goal-parameterized RL off-policy algorithm.
**Goal-parameterized architectures**: it requires a variable for…
-
Hello, thanks for making this repo, I tried to connect my env and run it but I get the following error,
**SyntaxError: Non-ASCII character '\xce' in file /home/at-lab/catkin_ws3/rl_pro_telu/mpo/mpo…