-
The RLLib converges slowly on a simple environment compared to comparable algorithms with different libraries under same conditions (see below the results). Is this something that is expected or is th…
-
I'm sorry to bother you. I found the ep_reward value was nan in log info. I think the reason of ep_reward value was nan that the monitor was not called at all in dqn_cnn.py. I wonder if it's because o…
-
- [x] I have marked all applicable categories:
+ [x] exception-raising bug
+ [ ] RL algorithm bug
+ [ ] documentation request (i.e. "X is missing from the documentation.")
+ [ ] ne…
-
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [R…
-
您好,看了您在B站上的演示视频,我对您的训练过程非常感兴趣。我目前实现了一版近似DQN的算法,网络结构仅仅是小幅改动。但我的训练效率异常低下。特别的,breakout环境交互过程中,steps数累加得非常慢,进而导致update缓慢。具体log如下:
>09:28:30 AM > ep 12889 done. total_steps=712610 | reward=2.0 | episode…
-
DQN中,每次选择动作都会进行一次经验回放。代码中经验回放放置在了完成一幕后,可能是忘了缩进了(两个虚线之间)
```python
def train(self, train_episodes=200):
if args.train:
for episode in range(train_episodes):
…
-
When `verbose== 1` in [`DQN`](https://stable-baselines.readthedocs.io/en/master/modules/dqn.html), what exactly does the produced output represent? I haven't yet looked at the source code, and, of cou…
-
1. we need to know the max number of ingredients.
2. plot a histogram of pizzas according to their number of ingredients
3. plot a histogram of teams according to the number of people
4. plot a his…
-
Hi, this is one of my first times writing doubts in github, if i made some mistake, let me know it, please. And, congrats from all the work on this bib.
I'm trying to train an dqn agent and i'm get…
-
- [ ] I have marked all applicable categories:
+ [ ] exception-raising bug
+ [ ] RL algorithm bug
+ [ ] documentation request (i.e. "X is missing from the documentation.")
+ [ ] ne…