-
跑了qwen2 72B的PPO出现的OOM异常
环境:4机32卡,80G显存,2T内存。理论上应该不会oom才对。
运行代码:
```
ray job submit --address="http://127.0.0.1:8265" \
--runtime-env-json='{"working_dir": "mycode/OpenRLHF-new","excludes"…
-
### 🐛 Bug
Get a device mismatch when attempting to use PPO with multiinput dict.
This was when calling:
```
with torch_no_grad():
actions = myppo.policy._predict(inp_dict, de…
-
作者您好,您的代码中H-ppo的时间范围为10s~40s,与PPO-discrete对比时,我发现PPO-discrete的持续时间作为10s,结果优于H-PPO;15s时,H-PPO效果好。您的结果是否如此?做混合动作空间对比实验时,是否需要与离散的ppo算法不同的持续时间做对照组?希望作者解答我的困惑。万分感谢!
-
Hello, when I run Formation task with `algo=mappo`, I got:
![mappo error](https://github.com/btx0424/OmniDrones/assets/55371740/1d49b582-6bdf-4fe6-91ae-3171c23397b6)
When I use `algo=ppo`, I got:
…
-
- [x] I have marked all applicable categories:
+ [x] exception-raising bug
+ [x] RL algorithm bug
+ [ ] documentation request (i.e. "X is missing from the documentation.")
+ [ ] ne…
-
While testing the performance of the PPO controller on the cartpole task, I encountered an issue where the training does not seem to converge, despite using the provided parameters (some changes are a…
-
When I ran the neural network code under the window system for training, the results of both the training and validation sets were successfully uploaded to the Wandb platform, but the test set could n…
-
### Describe the bug
Hello! I am trying to get Mava working to test out the library. Following the `README`, I created a 3.9.10 virtualenv and installed `jax[cuda12_local]` (I have existing CUDA 12.3…
-
We should start working on a new DRL algorithm based on MA PPO algorithm, it promises significant speed improvements, and would solver the critique of the centralized critic approach
-
Hello, can you take a look at the following error? Thanks.
run code [experiment.py
](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/ppo/experiment…