-
報酬を行動を出力する層に直接入れているため、正しい更新が行われていない。
-
Traceback (most recent call last):
File "./Runners/EvalGFPPO.py", line 82, in
runner.run(num_learning_iterations=iterations, log_interval=cfg_train["learn"]["save_interval"])
File "./Algor…
-
Hello, I found followig code in `sheeprl/algos/dreamer_v3.py`:
```python
# Train the agent
if update >= learning_starts and updates_before_training
-
Just letting you know, they keep walking in circles cause you removed the circle thingy
-
**Describe the bug**
Hi, everybody, I'm traning a llama model in step3 using deepspeed-chat. In version 0.10.1, it raised the following error([see in logs bleow](https://github.com/microsoft/DeepSp…
-
### Issue
When attempting to run `ppo.py` to train the RL model using on `cube_env.py` or the **Bimanual_Allegro_Cube** env, I get an _empty array error_ during Epoch 1 of the iteration loop in `ppo.…
-
Thank you for your great work! I run through the code but the GPU seems not to be used. Are there any parameters that need to be set? How can I train on GPU?
-
OpenRLHF 的调度缺陷在于 OpenRLHF 会有最高一半的GPU 闲置率
因为我们同时让所有的模型都放在 GPU 上,这是我们目前没有时间去做全异步训练导致的
所以就算极限优化调度性能打满GPU,不考虑更底层的技术优化,只看调度,最多也就让性能翻倍而已
我们提供了调优指南:https://github.com/OpenLLMAI/OpenRLHF?tab=readme-ov-file…
-
-
### What is the desired addition or change?
Simplify the creation of reinforcement learning agents in mlpack by having default values for common parameters, including network architectures and learni…