-
I am getting the following error when doing RLHF training:
Traceback (most recent call last):
File "/code/main.py", in
rlhf_trainer.train()
File "/code/trainer.py", in train
self.lea…
-
Hello,
In the [asynchronous dqn paper](http://arxiv.org/pdf/1602.01783v1.pdf), they also described an on policy method, the advantage actor-critic (A3C), which achieved better results than others, do …
-
Hi, @ChintanTrivedi I am using the modified version of your code to train the environment created using the Unity engine.
[I have modified the code to handle this].
Action space = Continuous
Obse…
-
## Motivation
### 1. Consistent style for `torch.nn.modules.loss.*Loss`
In `torch.nn.modules.loss`, there are many `*Loss` subclassing `nn.Module`. The `Loss.__init__()` does not takes other `nn…
-
# Asynchronous Methods for Deep Reinforcement Learning #
- Author: Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuo…
-
## Describe the bug
Not quite sure if this is supported behavior, but if I set `functional=True` for the A2C loss and `shifted=True` for `TD0Estimator` I get an internal error.
## To Reproduce
…
-
#代码中你说用的 td_error 的 actor-critic 算法,但实际算actor的gradient时,你用的是q而不是td_error, 修改如下
def learn(self, s, a, r, s_):
s, s_ = s[np.newaxis, :], s_[np.newaxis, :]
next_a = [[i] for i in r…
-
I have been digging into your paper and code, and noticed some potential discrepancies between the paper and the code. I would appreciate it very much if you could clarify.
1) in **training.py** line…
-
https://datawhalechina.github.io/easy-rl/#/chapter9/chapter9_questions&keywords
Description
-
First of all - thank you very much for this repository! You have made diving into Reinforcement Learning easier!
About the issue: I think you should use huber_loss instead of square_difference. Loo…