safe-rlhf Search Results

145 results
for safe-rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

PKU-Alignment/safe-rlhf #164

[Question] Equation (31) in your paper

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…

shuoyinn updated 4 months ago
2
jungwoo-ha/WeeklyArxivTalk #82

[20230507] Weekly AI ArXiv 만담 시즌2 - 16회차

veritas9872 updated 1 year ago
5
kurtzace/diary-2024 #11

Gen AI - LLM, RAG, Langchain

## [LangChain Development](https://app.pluralsight.com/library/courses/langchain-development/table-of-contents) by [Tom Taulli](https://app.pluralsight.com/profile/author/tom-taulli) founder : H…

kurtzace updated 1 week ago
8
irthomasthomas/undecidability #652

The Bitter Lesson

- [ ] [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) # The Bitter Lesson **DESCRIPTION:** "The Bitter Lesson Rich Sutton March 13, 2019 The biggest lesson that …

irthomasthomas updated 9 months ago
1
OpenRLHF/OpenRLHF #209

Loading a reward model causes ValueError: weight is on the m…

Hi! Thanks for your work on OpenRLHF. I trained a 4-bit Qwen-based reward model with this config (see the defaults): ``` parser.add_argument("--pretrain", type=str, default="Qwen/Qwen1.5-7B") par…

NZ99 updated 4 weeks ago
19
ethz-spylab/rlhf-poisoning #2

where is RegressionTrainer？

from safe_rlhf.values.cost import CostTrainer from safe_rlhf.values.reward import RewardTrainer # from safe_rlhf.values.regression import RegressionTrainer safe_rlhf.values has no regression

ZJPure updated 8 months ago
2
PKU-Alignment/safe-rlhf #118

dpo支持baichuan吗

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…

zhaobinNF updated 6 months ago
3
huggingface/peft #1443

size mismatch for base_model.model.model.layers.0.mlp.gate_p…

### System Info transformers version: 4.35.2 Platform: Linux-5.15.0-1050-aws-x86_64-with-glibc2.31 Python version: 3.10.12 Huggingface_hub version: 0.20.2 Safetensors versio…

tamanna-mostafa updated 4 months ago
16
weipu-zhang/STORM #2

Broken pipe

./train.sh Namespace(n='MsPacman-life_done-wm_2L512D8H-100k-seed1', seed=1, config_path='config_files/STORM.yaml', env_name='ALE/MsPacman-v5', trajectory_path='D_TRAJ/MsPacman.pkl') A.L.E: Arcade …

robotzheng updated 9 months ago
4
PKU-Alignment/safe-rlhf #159

[BUG] Train reward model initialized from the pretrain model…

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/P…

RyAkagiC updated 6 months ago
3

上一页 1...2 3 4 5 6 7 8...15 下一页

145 results for safe-rlhf

145 results
for safe-rlhf