issues
search
OpenLLMAI
/
OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.72k
stars
161
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
enable_ema cause runtime error when running train_ppo_llama.sh
#245
dshnightmare
opened
3 months ago
6
Update requirements.txt
#244
kfertakis
closed
3 months ago
0
Issues with bulding OpenRLHF locally
#243
kfertakis
closed
2 months ago
3
The tokenizer of reward model and policy model.
#242
eyuansu62
opened
3 months ago
2
Fix yi-34b tokenizer, use_fast=False
#241
hijkzzz
closed
3 months ago
0
When DPO Yi-34B Assertion `srcIndex < srcSelectDimSize` failed
#240
victorShawFan
closed
3 months ago
7
why generate use flash-attn is slower?
#239
dshnightmare
opened
3 months ago
2
Forced EOS token in vllm generation?
#238
mgerstgrasser
opened
3 months ago
6
Fix #235 mask prompt logits in DPO
#237
hijkzzz
closed
4 months ago
0
adding length penalty to reward
#236
karthik-nexusflow
opened
4 months ago
1
DPO Loss
#235
paulcx
closed
4 months ago
14
cuda.is_available is False in LLMRayActor
#233
THINK2TRY
closed
3 months ago
9
Is left-padding in PPO strictly necessary?
#232
mgerstgrasser
opened
4 months ago
6
Use existing wandb login if available.
#231
mgerstgrasser
closed
4 months ago
1
Actor-Critic-Model
#230
mgerstgrasser
opened
4 months ago
5
fix: make vllm lazy import
#229
wuxibin89
closed
4 months ago
0
Compatibility between vllm and NGC
#228
THINK2TRY
closed
4 months ago
5
vllm requirement problem
#227
jiashenggu
closed
4 months ago
6
OpenRLHF/openrlhf/models/utils.py LlamaRotaryEmbedding is not compatible with transformers 4.38.1
#226
jiashenggu
closed
4 months ago
0
Fix tensor shapes in Experience class documentation
#225
Thecats-Jfm
closed
4 months ago
0
clarification on config std and mean calculation
#224
karthik-nexusflow
closed
4 months ago
3
update support matrix
#223
haicaihi
closed
4 months ago
1
vLLM in batch_inference.py
#222
CoeusMaze
closed
4 months ago
2
Citation or comparison to trlX and NeMo-align.
#221
LouisCastricato
opened
4 months ago
3
Support top models stage2
#220
catqaq
opened
4 months ago
0
use_right_pad
#219
hijkzzz
closed
4 months ago
1
#217 fix position_ids
#218
hijkzzz
closed
4 months ago
1
`position_ids` related PPO bug
#217
tianhao-nexusflow
closed
4 months ago
2
support input_key and output_key in datasets
#216
hijkzzz
closed
4 months ago
0
fix: adjust vllm monkey patch for vllm>=0.2.7
#215
wuxibin89
closed
4 months ago
0
fix: monkey patch vllm with different versions
#214
wuxibin89
closed
4 months ago
0
fix: ignore non-persistent named buffer when save model
#213
wuxibin89
closed
4 months ago
0
error with saving checkoint with Mistral model
#212
karthik19967829
closed
4 months ago
10
vllm +zero2 hangs
#211
karthik19967829
opened
4 months ago
32
Loading a reward model causes ValueError: weight is on the meta device, we need a `value` to put in on 0
#209
NZ99
opened
4 months ago
19
Got stuck when using PyTorch extensions root during multi-slurm node SFT and cannot continue
#208
Dear-Sloth
closed
5 months ago
1
Fix get_strategy
#207
kajyuuen
closed
5 months ago
0
fix gradient_checkpointing_kwargs bug
#206
wwxFromTju
closed
5 months ago
0
Improve ease of use
#205
hijkzzz
opened
5 months ago
1
feat: support Input template
#203
hijkzzz
closed
5 months ago
0
Update dataset to support user input template
#202
rbao2018
closed
5 months ago
1
Implement KTO into OpenRLHF
#201
Dylancer1998
closed
5 months ago
1
Enable overlap_comm for better performance
#200
li-plus
closed
5 months ago
4
Change run mode so that it could be ran directly in shell.
#199
jovany-wang
closed
5 months ago
0
Workers (tasks / actors) killed due to memory pressure (OOM)
#198
LSC527
closed
5 months ago
4
fix: ray actor and critic arribute error
#197
wuxibin89
closed
5 months ago
0
Why is dschf defined in function scope?
#196
kajyuuen
closed
5 months ago
1
Bug: 'ActorModelRayActor' object has no attribute 'actor'
#195
hmzo
closed
5 months ago
1
reward数据准备的一个细节问题
#194
tonylin52
closed
5 months ago
2
question about support matrix
#193
paulcx
closed
4 months ago
21
Previous
Next