-
There is the code of reinforce.py
`for action, r in zip(self.saved_actions, rewards):
action.reinforce(r)`
And there is the code of actor-critic.py:
` for (action, value), r in zi…
-
Hello!
I noticed that the maximum eposides can be controlled by MAX_EPISODES during training, and EVAL_INTERVAL determines the evaluation intervals; however, the evaluation process seems to determi…
-
Hello,
I had a quick question about the form of the value function. Right now by default it is an action value function with a linear layer that receives the output of the decoder. I was wondering …
-
A3C: aka Asynchronous Advantage Actor Critic
It uses MPI, so I wonder if DeepMimic be trained using A3C?
-
-
Implement and explore the effectiveness of actor critic agent.
-
Hey @MikeInnes, if you are back could you please review the code? New models which I have added are Dueling DQN, Advantage Actor-Critic, and DDPG. Also, all the previous work done on DQN is added to d…
-
Hello,
I am trying to use this algorithm (rewritten in PyTorch with Gym vectorized envs) for motion imitation, starting with the PyBullet implementation of the DeepMimic environment. In the paper, …
-
In this example https://github.com/keras-team/keras-io/blob/master/examples/rl/actor_critic_cartpole.py, the gradient for the actor is defined as the gradient of loss $L = \sum \ln\pi (reward-value)$.…
-
I am getting the following error when doing RLHF training:
Traceback (most recent call last):
File "/code/main.py", in
rlhf_trainer.train()
File "/code/trainer.py", in train
self.lea…