policy-gradient Search Results

facebookresearch/Pearl #106

Phasic Policy Gradient

@rodrigodesalvobraz I would like to know whether Phasic Policy Gradient is implemented https://arxiv.org/abs/2009.04416. If it's not then I would like to try implementing it and add it to Pearl.

cvnad1 updated 2 weeks ago

philtabor/Youtube-Code-Repository #65

Policy Gradient, SAC doesn't learn

Hi! I have a few more questions about the code that I don't quite get. First, I was wondering what pybullet_envs is for. I installed the library but got errors when i tried to import it. I also do…

Ling01234 updated 6 days ago

LXXXXR/ICES #1

Is it applicable to the policy gradient based approach?

First of all, thank you for your team's contribution on ICES. I would like to ask if it is applicable for methods based on policy gradients? For example, MAPPO, lies in the fact that only actions o…

zhx0506 updated 1 month ago

huggingface/trl #2341

[Question] Why is Importance Sampling and Clipping applied i…

Hi, I read the [RLOO paper from Cohere ](https://cohere.com/research/papers/back-to-basics-revisiting-reinforce-style-optimization-for-learning-from-human-feedback-in-llms-2024-02-23), which claim…

shashankg7 updated 1 week ago

chainer/chainerrl #221

Expected Policy Gradients

http://arxiv.org/abs/1706.05374 It is equivalent to implement a new Explorer that adds a Gaussian noise whose covariance is ρ_0 exp(cH(s)), where H(s) is a Hessian of Q(s,a) wrt a and ρ_0 and c are…

muupan updated 5 years ago

CityBrainChallenge/KDDCup2021-CityBrainChallenge-starter-kit #42

Policy gradient method

DQN was provided as an example. Why was the policy gradient method not provided? Does it work on these problems?

john9636 updated 3 years ago

shufangxun/LLaVA-MoD #5

CUDA OOM issues

Hello, I've been trying to qwen2 0.5B and tinyclip using the repository, but I'm running into CUDA OOM issues on the dense2dense distillation step. Im running on 4 80GB A100s, I was wondering if I …

pumetu updated 1 week ago

davidhershey/feudal_networks #10

Transition Policy Gradients

From the paper： in fact the properform for the transition policy gradient arrived at in eqn.10. manager_loss = -tf.reduce_sum((self.r-cutoff_vf_manager)*dcos) ( from code ) why not implement th…

gyh75520 updated 5 years ago

google-deepmind/rlax #109

Stop action gradient in policy gradient loss

The current implementation of `policy_gradient_loss` is: ```python log_pi_a_t = distributions.softmax().logprob(a_t, logits_t) adv_t = jax.lax.select(use_stop_gradient, jax.lax.stop_gradient(adv_t)…

danijar updated 1 year ago

huggingface/deep-rl-class #285

(Updated for clarity) Apologies if I'm wrong, but it seems to me that there are some mathematical issues in [unit 4 "diving deeper..."](https://github.com/huggingface/deep-rl-class/blob/main/units/en…

dylwil3 updated 1 year ago

1000+ results
for policy-gradient