-
[paper](https://arxiv.org/pdf/1707.06347)
## TL;DR
- **I read this because.. :** 배경지식 차
- **task :** RL
- **problem :** q-learning은 너무 불안정하고, trpo 는 상대적으로 복잡. data efficient하고 sclable한 arch…
-
The Soft Actor-Critic paper ([arXiv v2](https://arxiv.org/abs/1801.01290)) says, in the last paragraph on page 5:
> We then use the minimum of the Q-functions for the value gradient in Equation 6 a…
-
ImportError Traceback (most recent call last)
in ()
2 import tensorflow as tf
3 import numpy as np
----> 4 from policy_gradient import util
5 from poli…
-
Are there any plans to add functionality to allow using prior policies for regularization similar to that of the Stein variational policy gradient (SVPG) (SVPG paper available at: https://arxiv.org/ab…
-
When I try to drag the lightbulb around, the whole body of the lamp moves back and forth around the query point with a large amplitude, unlike the smooth transitions in the handout .gif. (It does slow…
-
Thanks for providing the pytorch version of option critic. I want to ask why don't we clean replay buffer after each episode for on-policy policy gradient update? I think both algorithm 1 in the paper…
-
---
**Issue Description:**
When executing the `sh scripts/train_autophase_offline_q_value_rank.sh` script, I encounter the following error:
```
Error executing job with overrides: ['seed=2…
chhnb updated
8 months ago
-
The link is on the word "here" just above "Implementing Reward-to-Go Policy Gradient" (scroll one line up from: https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html#implementing-reward-to…
-
### Feature Name
Adding Policy Gradient Methods Visualizations
### Feature Description
Develop policy gradient methods that directly optimize the policy by updating it toward higher expected reward…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
###…