-
Hi all,
I am excited about the repo but met this error after running the train.py. Any idea or thoughts on that will be appreciated.
I'm running in 3.16.0-4-amd64, and python 3.5.2
Thanks,
[20…
-
See the `feature/fix_lstm` branch for [a test](https://github.com/hill-a/stable-baselines/blob/feature/fix_lstm/tests/test_lstm_policy.py) which [fails](https://travis-ci.com/hill-a/stable-baselines/b…
-
![微信截图_20210401182829](https://user-images.githubusercontent.com/13194870/113281468-2d185680-9318-11eb-90de-e676e6c521c0.png)
这里面相当于用theta_now产生的轨迹来估计Q_phi(s,a)
是不是有点问题?
Q_phi的参数是theta还是theta_now…
-
@antoine-galataud @takaomoriyama
Hello, I've been working on this project for a long time and I've trained the model using the TRPO policy for over 2000 epochs, but the reward would get stabalised e…
-
Hello,
executing `python -m baselines.run --alg="deepq" --env="QbertNoFrameskip-v4" --num_timesteps="1e4" --log_path="~/logs/"` will not produce any monitor.csv files.
Only trpo and deepq are affect…
-
What is the expected behaviour of on off policy algorithms when the action space itself changes with episodes. This leads to non Stationarity?
Action space is continuous. Typical case in Mujoco Ant…
-
Pearlmutter method only gives "good" value of hessian vector product in the first two iterations in conjugate gradient loop
-
All of the scripts trpo_gym_tf_cartpole.py, trpo_gym_tf_cartpole.py, trpo_cartpole_pickled.py, ddpg_cartpole.py stopped working just after the start with the same error message:
```
rllab\examples>…
-
- Value based RL
- [ ] DQN
- [ ] Rainbow DQN
- [ ] [CQL](https://sites.google.com/view/cql-offline-rl)
- Value based + Policy based RL
- [x] DDPG
- [ ] [TD3](https://spinni…
-
[paper](https://arxiv.org/pdf/1502.05477.pdf)
## TL;DR
- **I read this because.. :** CS285 기말과제
- **task :** reinforcement learning
- **problem :** 이론적으로 무조건 성능이 개선되는 policy update 방식이 있을까…