-
`Agent`s are entities with a `sample_action` and `update` method, in potence.
We exclude from the list exploration strategies and curricula.
_Implement_ means either to produce new code from the pape…
-
-
Adding N-distill according to [https://arxiv.org/abs/1902.02186](url)
- [x] Add next observation to trajectory data structure
- [x] Directly compute gradient using the given update rule (This is t…
-
Roy is trying out on Deep Evolutionary + 2 other RL model types possible Deep Deterministic Policy Gradient etc
-
### Description
The following code
```
import jax
import jax.ad_checkpoint
from jax import numpy as jnp
@jax.jit
def apply(params, x):
def step(y, i):
y = jnp.sin(y)
y = …
-
1. Why Model-Based?
- It's possible to be more data efficient although model-free might have better asymptotic performance
- Models allow easily injecting inductive biases
2. What about other ge…
-
[Distributed Distributional Deterministic Policy Gradients](D4PG)
Reference implementation:
- https://github.com/deepmind/acme
PyTorch implementation:
- https://github.com/fabiopardo/tonic
-
-
### What happened + What you expected to happen
When the user needs to execute the DDPG algorithm,the current DDPGTrainer can only support the single-machine version of the algorithm. If user needs t…
-
the following code generates an error in some of the most recent versions of `py-torch`: https://github.com/microsoft/oac-explore/blob/cbc0333cc9b616f6bbca9d6d9cdd37fd29ef55e7/trainer/trainer.py#L146-…