-
See https://github.com/pytorch/pytorch/issues/975 for more info
PyTorch TRPO appears 50% slower than TF. Not sure about PPO, but I expect the wall-clock time gap will be the same.
To fix this is…
-
```
В файле задание на второй релиз.
Пожалуйста, задавайте вопросы, если что
непонятно или с Вашей точки зрения может
иметь различное толкование.
И.Г.
```
Original issue reported on code.google.co…
-
Hi, I am trying to reproduce the experiment from paper "Learning human behaviors from motion capture by adversarial imitation". I used the code from https://github.com/huiwenzhang/merel-mocap-gail. Bu…
-
Reinforcers,
(that was cheezy...)
Both deepq and trpo_mpi call two resets consecutively for every "done" sent into the algorithm. Not sure what the facility of that is.
Also, they both exit o…
-
Hello,
I've tried in vain to find suitable hyperparameters for SAC in order to solve MountainCarContinuous-v0.
Even with hyperparameter tuning (see "add-trpo" branch of [rl baselines zoo](https:…
-
thanks for implementation of trpo, there exist some details that do not make sense to me so far
I can't see why kl_firstfixed is defined as following
`kl_firstfixed = tf.reduce_sum(tf.stop_gradient…
-
Policy Search
- [ ] [PI2](http://proceedings.mlr.press/v9/theodorou10a/theodorou10a.pdf), is already implemented #28
- [ ] [PoWER](http://www.ias.informatik.tu-darmstadt.de/publications/peters_ADPR…
-
the error a contributor got when using the `categoricalgrupolicy` with `TRPO` on the `tf` branch, computing backwards passes was
```
tensorflow.python.framework.errors_impl.InvalidArgumentError: No…
-
I am creating a child class that inherits from TRPO, but upon initializing the optimizer, I get
`TypeError: Failed to convert object of type to Tensor`
Here is the overall stacktrace:
```py…
-
I noticed while browsing the RL examples that the PPO implementation is actually A2C (which there's already an example for). On line 141, this line:
```Rust
let action_loss = (-advantages.detach()…