OpenLLMAI OpenRLHF issues

OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

https://openrlhf.readthedocs.io/

Apache License 2.0

1.72k stars 161 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 256 != 2 * 18 * 7

#346 hehebamei opened 20 hours ago
6
reward is always 0 when training DPO

#345 UbeCc closed 20 hours ago
1
Feature: Define a set of default data formats for OpenRLHF to reduce the cost of using custom data for everyone.

#344 catqaq closed 2 days ago
1
Qwen-32B train RM using adam_offload& zero3 lead to Runtime Error

#343 victorShawFan opened 2 days ago
2
it occurs error when im trying to build a docker container.

#342 hehebamei closed 2 days ago
3
support remote rm and ref model api for ppo

#341 catqaq opened 4 days ago
8
[pre-commit.ci] pre-commit suggestions

#340 pre-commit-ci[bot] closed 4 days ago
0
Status message: Unexpected error occurred: The actor 2c5251641e72297b4e3f4d7f01000000 is unavailable

#339 lusongshuo-mt closed 1 day ago
2
An error occurred during supervisied fine-tuning.

#338 hehebamei opened 4 days ago
2
Multi-node training. Slurm vs Slurm + Ray

#337 yannikkellerde closed 5 days ago
1
vLLM related: model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (6048).

#336 mickelliu closed 1 week ago
2
Support LoRA+VLLM, especially for ZeRO-3.

#335 luo-li-ba-suo closed 1 day ago
4
train_rm apply custom tokenizer chat template

#334 mickelliu closed 1 week ago
0
Qwen2 ppo

#333 Yusifu closed 2 days ago
1
How much memory(RAM) is required to train a 70B Llama2 model with two 80G A800 nodes?

#332 luo-li-ba-suo opened 1 week ago
7
PPO加载完模型后卡在bundle_reservation_check_func这里

#331 lixsh6 opened 1 week ago
1
Easy to miss bug that results in min_new_tokens not working

#330 yannikkellerde closed 1 week ago
0
qwen2 72B PPO OOM

#329 lixsh6 opened 2 weeks ago
5
Update requirements.txt

#328 Atry closed 1 week ago
3
Could you give an example of testing deepspeed-chat time?

#327 youngyoung321 closed 2 weeks ago
7
qwen2 sft后的模型使用kto训练loss nan

#326 vincezengqiang opened 2 weeks ago
2
[rank3]: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cpu!

#325 xiechengmude closed 2 weeks ago
3
Generate function for distributional training

#324 louieworth opened 3 weeks ago
2
多卡并行无法model.generate

#323 louieworth closed 3 weeks ago
2
/openrlhf must be an existing directory or a zip package

#322 harvinyou closed 4 weeks ago
1
训练启动时，如何指定gpu的数量?

#321 harvinyou closed 4 weeks ago
1
[Question] Is multi-nodes stage 3 model loading supported?

#320 mickelliu closed 4 weeks ago
2
mixtral 8*7B的最佳训练参数，推理参数可以提供一个吗?

#319 harvinyou closed 4 weeks ago
1
train_ppo_llama_ray.sh run two H800 machine error

#318 yangzhipeng1108 closed 4 weeks ago
3
ray多节点训练下deepspeed zero3的切分还是按照 node数*8卡来切分吗？

#317 lma-c4d closed 4 weeks ago
1
train_ppo_llama_ray_70b.sh run two H800 machine error

#316 yangzhipeng1108 closed 4 weeks ago
1
Moving model between GPU and CPU

#315 kfertakis closed 4 weeks ago
3
run train_ppo_llama_ray.sh error

#314 yangzhipeng1108 closed 1 month ago
0
Failed to update weights to vLLM

#313 thirteenflt closed 1 month ago
3
zero3 training error

#312 karthik-nexusflow closed 4 weeks ago
1
可以增加支持SimPO吗

#311 victorShawFan opened 1 month ago
2
wrong action_log_probs returned?

#310 thirteenflt closed 1 month ago
1
Does this codebase consider using "torch.compile"?

#309 eyuansu62 closed 1 month ago
2
Dummy token for prompts in HH datasets

#308 louieworth opened 1 month ago
2
Will 2 x GPU setups be supported

#307 llmlocal opened 1 month ago
1
使用Deepseek-lite训练DPO，显示expected mat1 and mat2 to have the same type, but got: float != c10: : BFLoat16

#306 victorShawFan opened 1 month ago
3
Strange Kill of Critic Model

#305 Ricardokevins opened 1 month ago
5
Suggestion on the configurations

#304 Ricardokevins opened 1 month ago
1
Incompatibility with Qwen

#303 Ricardokevins closed 1 month ago
2
Support Llama-3 models

#302 wenlinyao closed 1 month ago
1
action_log_probs重复计算

#301 cdm114514 closed 1 month ago
2
[Question] EOS in reward model dataset

#300 qwenzo opened 1 month ago
3
Claim your paper on HF

#299 adeenayakup closed 1 month ago
1
Added GPU memory specs and clarifications, fixed typo.

#298 KT313 closed 1 month ago
2
Avoid monkey patching vLLM

#297 Atry opened 1 month ago
1