- [ ] I have marked all applicable categories:
+ [ ] exception-raising bug
+ [ ] RL algorithm bug
+ [ ] documentation request (i.e. "X is missing from the documentation.")
+ [ ] ne…
I have trouble finding examples using learning rate schedule with PPO2 algorithm, although it seems possible to use it :
import gym
import numpy as np
import tensorflow as tf
class Memory(object):
def __init__(self):
self.ep_obs, self.ep_act, self.ep_rwd, self.ep_neglogp = [], [], [], []…
@jachiam Hi! It's me again! 2 days ago I posted an issue on using multiple cpu on ExperimentGrid that seems to only give the wrong log when run in Pycharm but fine in terminal. I did some more experim…
# Problem description
The translated code is not working in when eager execution (default in tf2) is enabled. I thas similar behaviours as the PyTorch code. I will, therefore need to compare the tw…
[07-08 00:22:31 MainThread @logger.py:224] Argv: D:/Envs/SmartCar/DDPG/train.py
C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\importlib\_bootstrap.py:219: RuntimeWarning: numpy.uf…
- [ ] I have marked all applicable categories:
+ [ ] exception-raising bug
+ [ ] RL algorithm bug
+ [ ] documentation request (i.e. "X is missing from the documentation.")
+ [ ] ne…
Hi, I participate in [this challenge](https://www.aicrowd.com/challenges/neurips-2020-procgen-competition), which requires using `ray[rllib]==0.8.6`.
I've implemented an algorithm and it works wit…
你好 我想在parl 基础上设计其他的方法 其中model 里面有了 action网络和critic网络
我还想再弄一个predict网络 我增加后,在执行目标网络 到当前网络复制函数时会报错
In the original paper of IMPALA, the state value estimation and the action were the output of the same net, and the net was updated with the sum of three losses , which is not usual in the actor-criti…