-
@rodrigodesalvobraz I would like to know whether Phasic Policy Gradient is implemented https://arxiv.org/abs/2009.04416. If it's not then I would like to try implementing it and add it to Pearl.
-
Hi! I have a few more questions about the code that I don't quite get.
First, I was wondering what pybullet_envs is for. I installed the library but got errors when i tried to import it. I also do…
-
First of all, thank you for your team's contribution on ICES. I would like to ask if it is applicable for methods based on policy gradients?
For example, MAPPO, lies in the fact that only actions o…
-
Hi,
I read the [RLOO paper from Cohere ](https://cohere.com/research/papers/back-to-basics-revisiting-reinforce-style-optimization-for-learning-from-human-feedback-in-llms-2024-02-23), which claim…
-
http://arxiv.org/abs/1706.05374
It is equivalent to implement a new Explorer that adds a Gaussian noise whose covariance is ρ_0 exp(cH(s)), where H(s) is a Hessian of Q(s,a) wrt a and ρ_0 and c are…
-
DQN was provided as an example. Why was the policy gradient method not provided? Does it work on these problems?
-
Hello,
I've been trying to qwen2 0.5B and tinyclip using the repository, but I'm running into CUDA OOM issues on the dense2dense distillation step. Im running on 4 80GB A100s, I was wondering if I …
-
From the paper:
in fact the properform for the transition policy gradient arrived at in eqn.10.
manager_loss = -tf.reduce_sum((self.r-cutoff_vf_manager)*dcos) ( from code )
why not implement th…
-
The current implementation of `policy_gradient_loss` is:
```python
log_pi_a_t = distributions.softmax().logprob(a_t, logits_t)
adv_t = jax.lax.select(use_stop_gradient, jax.lax.stop_gradient(adv_t)…
-
(Updated for clarity)
Apologies if I'm wrong, but it seems to me that there are some mathematical issues in [unit 4 "diving deeper..."](https://github.com/huggingface/deep-rl-class/blob/main/units/en…