-
[paper](https://arxiv.org/pdf/1707.06347)
## TL;DR
- **I read this because.. :** 배경지식 차
- **task :** RL
- **problem :** q-learning은 너무 불안정하고, trpo 는 상대적으로 복잡. data efficient하고 sclable한 arch…
-
-
It would be good to have Mujoco baseline results for ACKTR, A2C, TRPO , PPO and DDPG for Mujoco v2 environment update. It would be good to have benchmark results compare against.
-
The link of the slides is invalid – [slides #1 (trpo)](https://docs.google.com/presentation/d/15Z_AVBsO9VuOSZ5uY-Q4by3tHKiRSENchhAKHhCxIOc/present?token=AC4w5VgM6o7lCOmwtNFI3lfzyPv2PHOpRQ%3A1511795215…
-
This is more of a question than an issue. I noticed that in the implementation of the above-mentioned algorithms, action limits are not taken into account. Environments handle this clipping internally…
-
Package architecture:
- controllers:
> class control: PID, pure pursuit, bang-bang, open-loop(velocity profile), ...
> optimal control: lqr, ddp, mpc, ...
> collision avoidance: RVO, O…
-
I'm trying to convert the coordinates of markers from the visual coordinates to canvas coordinates. If I do this with a camera with fov=0, everything works fine:
``` python
import numpy as np
from vi…
-
- [x] check and fix C51 [deaab73]
- [x] check qrdqn [deaab73]
- [ ] check iqn
- [ ] check and fix Rainbow
- [ ] check on-policy buffer sampling
- [ ] check function `discounted_sum`
- [ ] check …
-
Hey there, I wanna reproduce your work in your final thesis. I did see that you have some scripts to run PPO, DDPG and TRPO, but I think the directory is kind of hard for me to understand, can you exp…
-
Hi!
MeanKLBefore is defined at optimize_policy in npo.py
``` npo.py
def optimize_policy(self, itr, samples_data):
all_input_values = tuple(ext.extract(
samples_data,
…