-
# Implementing Proximal Policy Optimisation
I've used some of the [PyTorch RFC](https://github.com/pytorch/rfcs/blob/master/README.md) template here for clarity.
**Authors:**
* @salmanmohammadi…
-
**Machine: MAX1100**
**ipex-llm: 2.1.0b20240421**
**bigdl-core-xe-21 2.5.0b20240421
bigdl-core-xe-esimd-21 2.5.0b20240421**
[Related PR](https://github.com/intel-analytics/ipex-llm…
-
Hello Patrick,
I am doing an implementation of the PPO algorithm for a custom environment and first wanted to test things out with a standard example and I choose CartPole-v1 implemented with [`gym…
-
Hi! I'm a bit puzzled as to how a timeout could be handled correctly in your implementation of PPO (well, this is relevant for all variants). I am especially surprised by envpool, because seems like t…
-
Hi this a great project thankyou to share it , i transfert the code to ros2 humble and is working , now i change the algorithm to PPO but is not working can you give me some tips and tricks to implem…
-
First, thank you for your efforts in helping to bring accurate and performant RLHF techniques to the open-source community.
I'm raising this issue hoping to get some clarification on a couple implem…
-
[Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)
-
Here is a Robot Script for exporting the PPO_reasoned_merged.owl file:
```
./robot filter --input ppo.owl --term PPO:0002300 --select "self annotations descendants" --signature true export --he…
-
I use the PPOTrainer on Mixtral with 8 GPUs whose CUDA version is 12.4. Would you happen to have any idea about solving the following issue? (Also, I have updated all python packages)
Here is the e…
-
Hi,
The current PPO implementation does not seem to account for time limits. While the `EpisodeWrapper` from brax is used, which tracks a truncation flag ([source](https://github.com/google/brax/bl…