-
### Required prerequisites
- [X] I have read the documentation .
- [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…
-
-
## [LangChain Development](https://app.pluralsight.com/library/courses/langchain-development/table-of-contents)
by [Tom Taulli](https://app.pluralsight.com/profile/author/tom-taulli)
founder : H…
-
- [ ] [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)
# The Bitter Lesson
**DESCRIPTION:**
"The Bitter Lesson
Rich Sutton
March 13, 2019
The biggest lesson that …
-
Hi! Thanks for your work on OpenRLHF. I trained a 4-bit Qwen-based reward model with this config (see the defaults):
```
parser.add_argument("--pretrain", type=str, default="Qwen/Qwen1.5-7B")
par…
-
from safe_rlhf.values.cost import CostTrainer
from safe_rlhf.values.reward import RewardTrainer
# from safe_rlhf.values.regression import RegressionTrainer
safe_rlhf.values has no regression
-
### Required prerequisites
- [X] I have read the documentation .
- [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…
-
### System Info
transformers version: 4.35.2
Platform: Linux-5.15.0-1050-aws-x86_64-with-glibc2.31
Python version: 3.10.12
Huggingface_hub version: 0.20.2
Safetensors versio…
-
./train.sh
Namespace(n='MsPacman-life_done-wm_2L512D8H-100k-seed1', seed=1, config_path='config_files/STORM.yaml', env_name='ALE/MsPacman-v5', trajectory_path='D_TRAJ/MsPacman.pkl')
A.L.E: Arcade …
-
### Required prerequisites
- [X] I have read the documentation .
- [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/P…