-
### What is the problem?
In rollout_worker.py, when the `cls` is `TFPolicy` or subclass of `TFPolicy`, the following line will fail
```
policy_map[name] = cls(obs_space, act_space, merged…
-
I want to Implement COMA with parl, and I use two fluid.Program() to train critic and actor respectively. however I meat two error related to optimizer.
### error 1:
code:
```python
def lear…
-
Paddle 版本:1.5.1
背景:复现算法COMA,multi-aget算法
详细错误提示:
### error 1:
code:
```python
def learn(self, obs, actions, last_actions, q_vals, lr):
"""
Args:
obs: [4*env*ba…
-
I'd love to see an asynchronous version of DDPG on here. Would anyone be able to help me with it?
Here are quick thoughts:
A3C seems to be king of the hill at the moment, but DDPG has some clear…
-
Have you tried spectral normalization GAN & adding L1 distance to WGAN loss? I wonder how these two changes could impact the performance:
## 1. Replacing WGAN-GP with spectral normalization
Spectr…
-
Really cool that you've been working on implementing that algorithm in Python. I've been thinking of doing this as well. As far as I can tell, you're the only one that's tried doing this yet, so I'm …
-
Dear authors,
Great work for the excellent. Below are the lists of supported models, which we think some other more methods are also crucial for some applications.
Discrete-Action DQN
Parametric…
-
On this page: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
More specifically in this diagram: https://spinningup.openai.com/en/latest/_images/rl_algorithms_9_15.svg
I am sur…
-
No matter whether baseline is used, ` PolicyGradientModel.reward_estimation ` computes cumulative rewards in one batch by using ` util.cumulative_discount ` with ` cumulative_start=0.0 ` .
In my op…
-
**TF Version: 2.0.0-dev20190214
Windows 10
Anaconda Python 3.6.5
GPU: GeForce GTX 1070 Max-Q Design
[Tensorflow 2.0 (gpu) nightly](https://pypi.org/project/tf-nightly-gpu-2.0-preview/) installed v…