-
**Used commit:** 2901f36ce8b21225d541f1110a2c100861f59431
When running `ml/rl/test/gym/run_gym.py -p=ml/rl/test/gym/discrete_dqn_maxq_asteroids_v0.json` (as well as other Atari experiments), I obse…
-
As of 5c252ea, this repo has been checked over several times for discrepancies, but is still unable to replicate DeepMind's results. This issue is to discuss any further points that may need fixing.
…
-
Hi Max,
this is the Dueling DQN implementation from DeepMind: https://arxiv.org/pdf/1511.06581.pdf
![image](https://user-images.githubusercontent.com/23219722/58774004-7d43f780-858d-11e9-8dc8-7b…
-
### System information
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: Linux 16.04
- **Ray installed from (source or binary)**: Source
- **Ray version**:0.5.3
- **Python version…
-
### Describe the problem
With TF 2.0 on the horizon, tensorflow models in rllib should not exclusively depend on variable scopes to reuse variables. Similarly, graph collections should be avoided in …
-
-
Please help me understand why the previous state is always equal to the next state ?
if thats the case how will any NN will work on state.
```
import numpy as np
from q_learning.utils import Sca…
-
Something like DQN would be nice. This would involve using some gym like environment, maybe [openai-gym](https://gym.openai.com/).
Two tutorials involving DQN and PyTorch: [tutorial 1](https://pyto…
-
https://jsapachehtml.hatenablog.com/entry/2018/12/25/233937
-
Thank you for these useful examples. I am trying to implement D4PG in multiple agents that interact with each other, share the same reward, but each agent takes its own actions. I wonder if you had an…