policy-gradient Search Results

1000+ results
for policy-gradient

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bentrevett/pytorch-rl #2

actor critic possible error

Hello, Ben! Thank you for a great tutorial series. I have a question regarding your [actor-critic notebook](https://github.com/bentrevett/pytorch-rl/blob/master/2%20-%20Actor%20Critic%20%5BCartPole%5…

hawkeoni updated 4 years ago
2
DemokratieInBewegung/plenum #192

sort out colour gradient

The colour gradient at the top is partly hand-crafted; the right end uses the CI colour value but the left end uses other colour values and specifies an intermediate value. The colour gradient in the …

joriki updated 6 years ago
1
dfridovi/rl #6

Implement GP Q-learning

The basic idea is to represent the joint state-action value function as a Gaussian process. The optimal policy can be approximated with a few steps of gradient descent on the action subspace, holding …

dfridovi updated 7 years ago
3
google-deepmind/mujoco #1182

The CG Solver in MJX dosen't support reverse-mode differenti…

I'm trying to differentiate the MJX step function via the autograd function `jax.grad()` in JAX, like: ``` def step(vel, pos): mjx_data = mjx.make_data(mjx_model) mjx_data = mjx_data.replace(q…

LyuJ1998 updated 2 months ago
7
facebookresearch/ReAgent #213

Current plans or progress for TRPO and PPO

Im just curious if there is any effort to add these policy gradient methods ?

balloch updated 4 years ago
1
philtabor/Multi-Agent-Deep-Deterministic-Policy-Gradients #5

I just fixed the problem about backward

here is the solution https://github.com/philtabor/Multi-Agent-Deep-Deterministic-Policy-Gradients/issues/2#issuecomment-912548033

LoveDLWujing updated 2 years ago
2
allentran/rl-l2t #5

Generate Policy gradients, wrt u, given sarsas, and Deep Q w…

allentran updated 9 years ago
1
amasky/ram #3

sampled location must volatile=True(stop gradient)

in ram.py ``` eps = (self.xp.random.normal(0, 1, size=m.data.shape)).astype(np.float32) l = m.data + np.sqrt(self.var)*eps ln_pi = -0.5 * F.sum((l-m)*(l-m), axis=1) / self.var #log(location pol…

machanic updated 7 years ago
2
ajlangley/cpo-pytorch #10

mean kl is always=0

hi, I notice that in your code, mean_kl always=0 constraint_grad = flat_grad(constraint_loss, self.policy.parameters(), retain_graph=True) # (b) mean_kl = mean_kl_first_fixed(a…

xzhang2523 updated 1 year ago
5
DeepX-inc/machina #176

Add N-distill

Adding N-distill according to [https://arxiv.org/abs/1902.02186](url) - [x] Add next observation to trajectory data structure - [x] Directly compute gradient using the given update rule (This is t…

pwuethri updated 5 years ago
1

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for policy-gradient

1000+ results
for policy-gradient