-
Hi author, where can I view the paper corresponding to this code?
-
"I downloaded a model (Multi drone without obstacles) from the following URL for testing: https://huggingface.co/andrewzhang505/quad-swarm-rl-multi-drone-no-obstacles/tree/main.
When I executed th…
-
root@I196082a51d0070168c:/hy-tmp/DI-engine/dizoo/smac/config# python3 -u smac_5m6m_masac_config.py
[04-15 20:32:26] WARNING If you want to use numba to speed up segment tree, please install numba f…
-
buildKB successfully for /teamspace/studios/this_studio/KAG/kag/examples/hotpotqa/builder/./data/hotpotqa_sub_corpus.json
parallelQaAndEvaluate completing: 0%| | 0/2 [00:00
-
My configuration:
`ray job submit --address="http://127.0.0.1:8265" \
--runtime-env-json='{"working_dir": "/openrlhf", "pip": "/openrlhf/requirements.txt"}' \
-- python3 examples/train_ppo_…
-
### 🐛 Bug
Hi,
When I try to run TQC hyperparameter optimization with multiple jobs (n-jobs>1) with a GPU (this also happens with multiple CPU cores and n-jobs=1), it gives me this error:
```
…
-
### Bug description
This issue is apparent when attempting to run the following tutorial: https://lightning.ai/pages/community/tutorial/how-to-train-reinforcement-learning-model-to-play-game-using-…
-
使用PPO训练13B的模型,内存占用特别高,我应该怎么解决
-
### ❓ Question
I'm using a `custom gym env` with multi envs, and I want to write a customized callback function related to `StopTrainingOnRewardThreshold`, one difference is **I shall use `"rollout/e…
-
Why do you only train Auto-UI? Auto-UI seems to me a traditional RL policy model, not a LLM/VLM agent.