policy-gradient Search Results

1000+ results
for policy-gradient

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

long8v/PTIR #187

[168] Proximal Policy Optimization Algorithms

[paper](https://arxiv.org/pdf/1707.06347) ## TL;DR - **I read this because.. :** 배경지식 차 - **task :** RL - **problem :** q-learning은 너무 불안정하고, trpo 는 상대적으로 복잡. data efficient하고 sclable한 arch…

long8v updated 3 months ago
1
haarnoja/sac #14

paper/code conflict: using minimum Q in policy gradient

The Soft Actor-Critic paper ([arXiv v2](https://arxiv.org/abs/1801.01290)) says, in the last paragraph on page 5: > We then use the minimum of the Q-functions for the value gradient in Equation 6 a…

jpreiss updated 6 years ago
1
TheCEDL/homework2 #10

When I ran cells of HW2_Policy_Graident, it appeared 3 error…

ImportError Traceback (most recent call last) in () 2 import tensorflow as tf 3 import numpy as np ----> 4 from policy_gradient import util 5 from poli…

hiram94 updated 8 years ago
1
facebookresearch/Pearl #44

Functionality for Stein variational policy gradient and/or r…

Are there any plans to add functionality to allow using prior policies for regularization similar to that of the Stein variational policy gradient (SVPG) (SVPG paper available at: https://arxiv.org/ab…

leonhalgryn updated 8 months ago
2
dilevin/computer-graphics-kinematics #53

Inverse Kinematics: large ocsillations

When I try to drag the lightbulb around, the whole body of the lamp moves back and forth around the query point with a large amplitude, unlike the smooth transitions in the handout .gif. (It does slow…

zengzix2 updated 1 week ago
3
lweitkamp/option-critic-pytorch #11

Why not clean replay buffer after each episode for on-policy…

Thanks for providing the pytorch version of option critic. I want to ask why don't we clean replay buffer after each episode for on-policy policy gradient update? I think both algorithm 1 in the paper…

tedhuang96 updated 2 years ago
4
facebookresearch/RLCompOpt #2

ValueError: math domain err

--- **Issue Description:** When executing the `sh scripts/train_autophase_offline_q_value_rank.sh` script, I encounter the following error: ``` Error executing job with overrides: ['seed=2…

chhnb updated 8 months ago
1
openai/spinningup #301

Link to [proof of reward-to-go reducing variance] leads nowh…

The link is on the word "here" just above "Implementing Reward-to-Go Policy Gradient" (scroll one line up from: https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html#implementing-reward-to…

bramgrooten updated 2 years ago
1
ajay-dhangar/algo #1105

[Feature]: <Policy Gradient Methods>

### Feature Name Adding Policy Gradient Methods Visualizations ### Feature Description Develop policy gradient methods that directly optimize the policy by updating it toward higher expected reward…

priyashuu updated 1 week ago
2
axolotl-ai-cloud/axolotl #1888

Training with a large json dataset (>650K) throw error:pyarr…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ###…

bofei5675 updated 1 week ago
1

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for policy-gradient

1000+ results
for policy-gradient