-
Hello, Ben!
Thank you for a great tutorial series. I have a question regarding your [actor-critic notebook](https://github.com/bentrevett/pytorch-rl/blob/master/2%20-%20Actor%20Critic%20%5BCartPole%5…
-
The colour gradient at the top is partly hand-crafted; the right end uses the CI colour value but the left end uses other colour values and specifies an intermediate value. The colour gradient in the …
-
The basic idea is to represent the joint state-action value function as a Gaussian process. The optimal policy can be approximated with a few steps of gradient descent on the action subspace, holding …
-
I'm trying to differentiate the MJX step function via the autograd function `jax.grad()` in JAX, like:
```
def step(vel, pos):
mjx_data = mjx.make_data(mjx_model)
mjx_data = mjx_data.replace(q…
-
Im just curious if there is any effort to add these policy gradient methods ?
-
here is the solution
https://github.com/philtabor/Multi-Agent-Deep-Deterministic-Policy-Gradients/issues/2#issuecomment-912548033
-
-
in ram.py
```
eps = (self.xp.random.normal(0, 1, size=m.data.shape)).astype(np.float32)
l = m.data + np.sqrt(self.var)*eps
ln_pi = -0.5 * F.sum((l-m)*(l-m), axis=1) / self.var #log(location pol…
-
hi, I notice that in your code, mean_kl always=0
constraint_grad = flat_grad(constraint_loss, self.policy.parameters(), retain_graph=True) # (b)
mean_kl = mean_kl_first_fixed(a…
-
Adding N-distill according to [https://arxiv.org/abs/1902.02186](url)
- [x] Add next observation to trajectory data structure
- [x] Directly compute gradient using the given update rule (This is t…