-
I tried two different tune runs: with future data(you can find this code by "#Future data") and without. I expected to see a big difference in rewards betweent these two runs(or at lease some differen…
-
Hi there
I'm trying to use MuJoCo on a cluster (Compute Canada if it makes any difference). I am trying to compile the Python bindings for MuJoCo 2.2.0. Amazingly, they don't have the precompiled b…
-
I'm trying to start [notebook](https://colab.research.google.com/drive/1aQmH526cjCcB1JJZph2cywCBAjNiQgzW?usp=sharing) from [this article](https://medium.com/mlearning-ai/hyperparameter-optimization-us…
-
**I do not understand how adding entropy to loss will encourage exploration**
I understand that Entropy is a measure of unpredictability, or measure of randomness.
H(X) = -Sum P(x) log(P(x))
…
-
### What is the problem?
The recently added torch implementation of PPO #6826 is over 5X slower when training on atari (breakout) and also ends up slowly consuming all the system RAM (perf/ram_…
-
## 🐛 Bug
It seems like there is very small memory leak during forward and backward propagation through network that can lead to memory overflow after many hours of training.
I stumbled upon this…
-
### What is the problem?
when i do the restore the trained model, the error output as below:
```
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/ray/tune/trai…
-
OS: Windows 10
Tensorflow version: 2.6.0 (tensorflow-gpu)
Python version: 3.9.7
Ray version:1.6.0 (installed from pip)
A3C agent gives error on train() when I try to train multi-agent toy exampl…
-
### What is the problem?
I found multiple runs for PPO still have different performance even we set the same seed.
How can we obtain the exact same result with the same seed?
Current strate…
-
Hey, you guys!
I am fully impressed with your concise codes for implementing a distributed RL algorithm, like A3C.
And I am very interested in whether this framework supports training with multiple …