OpenLLMAI OpenRLHF issues

OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

https://openrlhf.readthedocs.io/

Apache License 2.0

1.73k stars 164 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Why is dschf defined in function scope?

#196 kajyuuen closed 5 months ago
1
Bug: 'ActorModelRayActor' object has no attribute 'actor'

#195 hmzo closed 5 months ago
1
reward数据准备的一个细节问题

#194 tonylin52 closed 5 months ago
2
question about support matrix

#193 paulcx closed 4 months ago
21
可以把reward model和reference model也剥离出去成vllm服务吗

#192 HuangLK closed 5 months ago
3
Enable deepspeed.zero.Init causes very strange spikes in PPO policy_loss

#191 wuxibin89 closed 4 months ago
2
support Mixture-of-Experts (MoE) model training

#190 ifromeast closed 5 months ago
1
Support LoRA and QLoRA (nf4), fix mixtral loss

#189 hijkzzz closed 6 months ago
0
About using vLLM for generation

#188 LSC527 opened 6 months ago
5
Actor killed after training one episode.

#187 zhanghaoie closed 5 months ago
9
fix: initialize value_head for ZeRO-3 reward model training

#186 wuxibin89 closed 6 months ago
0
fix: assign GPU for LLMActor if vllm_tensor_parallel_size=1

#185 wuxibin89 closed 6 months ago
0
What version of vllm should we use?

#184 LSC527 closed 6 months ago
3
support mixtral 8*7b balancing loss

#183 hijkzzz closed 6 months ago
0
fix: add unk_token when vllm generate eos_token only

#182 wuxibin89 closed 6 months ago
0
why not include eos_token in action_seq, which may make mistakes?

#181 ZiyiLiubird closed 6 months ago
16
Some weights of LLMForSequenceRegression were not initialized.

#180 eyuansu62 closed 6 months ago
4
add official doc

#179 catqaq closed 6 months ago
0
Support pipeline module such as LLaMA2Pipeline and InstructGPTPipeline

#178 catqaq opened 6 months ago
0
[bug]: AttributeError: 'dict' object has no attribute 'mean'

#177 eyuansu62 closed 6 months ago
1
Why not mask unk_token when actor generation?

#176 ZiyiLiubird closed 6 months ago
4
Bug？

#175 ZiyiLiubird closed 6 months ago
2
upgrade container for to_bettertransformer

#174 hijkzzz closed 6 months ago
0
[pre-commit.ci] pre-commit suggestions

#173 pre-commit-ci[bot] closed 6 months ago
0
feat: dynamic construct transformer for sequence classification

#172 wuxibin89 closed 6 months ago
0
High Memory Usage

#171 zhanghaoie closed 6 months ago
5
feat: enable transformers deepspeed ZeRO-3 integration

#170 wuxibin89 closed 6 months ago
0
refactor ds config and fix flash_attn/ model.config.pad_token_id

#169 hijkzzz closed 6 months ago
0
update Logo

#168 hijkzzz closed 6 months ago
0
Logo

#167 hijkzzz closed 6 months ago
0
Fix flash attention option

#166 li-plus closed 6 months ago
5
Optimize padding removal

#165 li-plus closed 6 months ago
2
remove pad token and embedding resize for llama

#163 hijkzzz closed 6 months ago
0
Optimize reward score gather/scatter

#162 li-plus closed 6 months ago
2
Support KTO

#161 hijkzzz closed 5 months ago
2
feat: add vLLM for text generation

#160 wuxibin89 closed 6 months ago
0
baichuan2-13b-base作为actor RuntimeError: CUDA error: device-side assert triggered

#159 netrookiecn closed 6 months ago
5
为什么速度回比deepspeed chat快4倍这么多

#158 tingshua-yts closed 6 months ago
4
Loading RM ckpt bug: AttributeError: 'NoneType' object has no attribute 'load'

#157 pikaqqqqqq closed 6 months ago
3
问一下，huggingface提供的checkpoint为pt文件，如何转成大模型常见的.bin

#156 zjintheroom closed 6 months ago
4
Add pipeline module to support more scientific comparative experiments and research

#155 catqaq opened 7 months ago
0
opt local file support

#154 catqaq closed 7 months ago
0
Repair parquet file reading issue in utils.py

#153 tsaoyu closed 7 months ago
1
Optimize generation post processing

#152 li-plus closed 6 months ago
4
feature: add api support for hosting a reward model

#151 ftmtk opened 7 months ago
5
Discussion on our 1st release.

#150 jovany-wang closed 3 months ago
1
[Severity] High similarity with Colossal-AI

#149 binmakeswell closed 7 months ago
4
Inquiry regarding the feasibility of fine-tuning LLaMA2-7B with a single A100

#148 callanwu closed 7 months ago
4
fix: ensure all experience has been send to critic before training

#147 wuxibin89 closed 7 months ago
0
support Qwen

#146 pikaqqqqqq closed 7 months ago
1

Previous Next