-
### System Info
```Shell
`Accelerate` version: 0.29.3
- Python version: 3.11.4
```
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks
- [ ] One of the sc…
-
**System Info:**
Memory: 500G
GPU: 8 * A100 80G
Question:
**Why using multi gpus in init of DeepSpeedRLHFEngine used much more memroy compared to using single gpu ?**
**Reproduce:**
Copy mode…
-
![image](https://user-images.githubusercontent.com/13724286/232206508-a702748c-3537-43fc-9755-e73ed1131fa6.png)
![image](https://user-images.githubusercontent.com/13724286/232206537-24ffaccd-fb5a-495…
-
Hi! Is there a specific reason that we train the reward model based on absolute scores rather than pairwise human preferences on the same prompts, as most of the other rlhf work?
-
Thank you for the great work!
The kl rewards seem to be computed each time calling train_rlhf(). [[code](https://github.com/microsoft/DeepSpeedExamples/blob/8f8099a813f3b223d5df39e0c15c748de4eb1669/a…
-
hello,i use deepspeed-chat training model,now,i start train step3,RLHF training,But i do not know how many epoch and my model parameters will be ok.One hours or longer?Can you give me some advice?
…
-
[Errno 2] No such file or directory: '.cache/ec2-user/Anthropic___json/Anthropic--hh-rlhf-a9fdd36e8b50b8fa/0.0.0/bd2024624bf0cc9525bb882643bfedfb1437c404efd58d805d47af1dea815973/json-train-00000-00000…
-
-
# Description
Supports for PEFT for chatllama models and trainings
# TODO
- [x] Add PEFT to Enable Parameter efficient fine-tuning in actor, reward and critic models.
- [ ] Check RLHF stabil…
-
Select a series of models to be used in the project. They will be fine-tuned, architecturally manipulated (i.e., replacing the last layer for reward model), and RLHF will be performed on all models.