-
首先非常感谢这个库,可以让多个13b的模型进行rlhf并起到不错的效果。
问题:在 https://huggingface.co/baichuan-inc/Baichuan2-13B-Base进行内部多轮数据sft后的对齐模型 + openrlhf 提供的 rm-anthropic_hh-lmsys-oasst-webgpt.pt 出现异常如下
(CriticModelRayActo…
-
These are sample commands in the documentation
```
torchrun --nnodes 1 --nproc_per_node 8 examples/stack_llama/scripts/supervised_finetuning.py --model_path= --streaming --no_gradient_checkpointi…
-
Agreement Contract
## Description
mTree is an agent-based modeling tool for building and testing microeconomic systems. The bounty will build an mTree capability to use the existing mTree model to w…
-
Hat tip @simontegg
http://www.wsj.com/articles/the-future-of-the-internet-is-flow-1443796858
I maybe 10% believe it, I believe some of the comments more. But it certainly fits the theme here...
-
### System Info
- `transformers` version: 4.30.2
- Platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35
- Python version: 3.10.11
- Huggingface_hub version: 0.15.1
- Safetensors version: 0.3…
-
作者大佬您好,想请教您一下:freeze模式下,支持PPO的调试么?
我在用freeze模式调完rm以后,跟着PPO的时候,系统提示:只有Lora才能支持PPO的调试。
-
I was installing LLM Studio on Ubuntu 22.04 using the same steps I did for 20.04, except that I followed the below steps to install the install the nvidia driver, and I am experiencing the below err…
-
请问基于llama2微调自己的模型流程也是把下面五个步骤都先后做一遍吗?我只是在llama2基础上增加我自己的训练数据微调,不需要全量训练。谢谢。
Pre-Training
Supervised Fine-Tuning
Reward Modeling
PPO Training
DPO Training
-
When I run evaluate.py for llama, I find that the logits of llama's generation code (model.generate(xxx)) are different when I use different settings: eval_main_dolly.sh and eval_main_dolly_mp4.sh. Th…
-
### Required prerequisites
- [X] I have read the documentation .
- [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…