issues
search
OpenLLMAI
/
OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.72k
stars
161
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 256 != 2 * 18 * 7
#346
hehebamei
opened
20 hours ago
6
reward is always 0 when training DPO
#345
UbeCc
closed
20 hours ago
1
Feature: Define a set of default data formats for OpenRLHF to reduce the cost of using custom data for everyone.
#344
catqaq
closed
2 days ago
1
Qwen-32B train RM using adam_offload& zero3 lead to Runtime Error
#343
victorShawFan
opened
2 days ago
2
it occurs error when im trying to build a docker container.
#342
hehebamei
closed
2 days ago
3
support remote rm and ref model api for ppo
#341
catqaq
opened
4 days ago
8
[pre-commit.ci] pre-commit suggestions
#340
pre-commit-ci[bot]
closed
4 days ago
0
Status message: Unexpected error occurred: The actor 2c5251641e72297b4e3f4d7f01000000 is unavailable
#339
lusongshuo-mt
closed
1 day ago
2
An error occurred during supervisied fine-tuning.
#338
hehebamei
opened
4 days ago
2
Multi-node training. Slurm vs Slurm + Ray
#337
yannikkellerde
closed
5 days ago
1
vLLM related: model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (6048).
#336
mickelliu
closed
1 week ago
2
Support LoRA+VLLM, especially for ZeRO-3.
#335
luo-li-ba-suo
closed
1 day ago
4
train_rm apply custom tokenizer chat template
#334
mickelliu
closed
1 week ago
0
Qwen2 ppo
#333
Yusifu
closed
2 days ago
1
How much memory(RAM) is required to train a 70B Llama2 model with two 80G A800 nodes?
#332
luo-li-ba-suo
opened
1 week ago
7
PPO加载完模型后卡在bundle_reservation_check_func这里
#331
lixsh6
opened
1 week ago
1
Easy to miss bug that results in min_new_tokens not working
#330
yannikkellerde
closed
1 week ago
0
qwen2 72B PPO OOM
#329
lixsh6
opened
2 weeks ago
5
Update requirements.txt
#328
Atry
closed
1 week ago
3
Could you give an example of testing deepspeed-chat time?
#327
youngyoung321
closed
2 weeks ago
7
qwen2 sft后的模型使用kto训练loss nan
#326
vincezengqiang
opened
2 weeks ago
2
[rank3]: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cpu!
#325
xiechengmude
closed
2 weeks ago
3
Generate function for distributional training
#324
louieworth
opened
3 weeks ago
2
多卡并行无法model.generate
#323
louieworth
closed
3 weeks ago
2
/openrlhf must be an existing directory or a zip package
#322
harvinyou
closed
4 weeks ago
1
训练启动时,如何指定gpu的数量?
#321
harvinyou
closed
4 weeks ago
1
[Question] Is multi-nodes stage 3 model loading supported?
#320
mickelliu
closed
4 weeks ago
2
mixtral 8*7B的最佳训练参数,推理参数可以提供一个吗?
#319
harvinyou
closed
4 weeks ago
1
train_ppo_llama_ray.sh run two H800 machine error
#318
yangzhipeng1108
closed
4 weeks ago
3
ray多节点训练下deepspeed zero3的切分还是按照 node数*8卡来切分吗?
#317
lma-c4d
closed
4 weeks ago
1
train_ppo_llama_ray_70b.sh run two H800 machine error
#316
yangzhipeng1108
closed
4 weeks ago
1
Moving model between GPU and CPU
#315
kfertakis
closed
4 weeks ago
3
run train_ppo_llama_ray.sh error
#314
yangzhipeng1108
closed
1 month ago
0
Failed to update weights to vLLM
#313
thirteenflt
closed
1 month ago
3
zero3 training error
#312
karthik-nexusflow
closed
4 weeks ago
1
可以增加支持SimPO吗
#311
victorShawFan
opened
1 month ago
2
wrong action_log_probs returned?
#310
thirteenflt
closed
1 month ago
1
Does this codebase consider using "torch.compile"?
#309
eyuansu62
closed
1 month ago
2
Dummy token for prompts in HH datasets
#308
louieworth
opened
1 month ago
2
Will 2 x GPU setups be supported
#307
llmlocal
opened
1 month ago
1
使用Deepseek-lite训练DPO,显示expected mat1 and mat2 to have the same type, but got: float != c10: : BFLoat16
#306
victorShawFan
opened
1 month ago
3
Strange Kill of Critic Model
#305
Ricardokevins
opened
1 month ago
5
Suggestion on the configurations
#304
Ricardokevins
opened
1 month ago
1
Incompatibility with Qwen
#303
Ricardokevins
closed
1 month ago
2
Support Llama-3 models
#302
wenlinyao
closed
1 month ago
1
action_log_probs重复计算
#301
cdm114514
closed
1 month ago
2
[Question] EOS in reward model dataset
#300
qwenzo
opened
1 month ago
3
Claim your paper on HF
#299
adeenayakup
closed
1 month ago
1
Added GPU memory specs and clarifications, fixed typo.
#298
KT313
closed
1 month ago
2
Avoid monkey patching vLLM
#297
Atry
opened
1 month ago
1
Next