-
Hi, very nice repo.
May I ask do you plan to reproduce ChatGPT/InstructGPT or GPT with RLHF based on JAX?
Best
-
I have put the `Dahous/rm-static` dataset as well as the the model `facebook/opt-1.3b` under the path
**DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning**
When r…
-
hi Umar, What an awesome free lecture and I cannot thank you enough for your service to all of us developers!
Sorry that I have to borrow this place for a question. In slides "RLHF and PPO" page 17…
-
[Errno 2] No such file or directory: '.cache/ec2-user/Anthropic___json/Anthropic--hh-rlhf-a9fdd36e8b50b8fa/0.0.0/bd2024624bf0cc9525bb882643bfedfb1437c404efd58d805d47af1dea815973/json-train-00000-00000…
-
Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.
-
Thank you for the great work!
The kl rewards seem to be computed each time calling train_rlhf(). [[code](https://github.com/microsoft/DeepSpeedExamples/blob/8f8099a813f3b223d5df39e0c15c748de4eb1669/a…
-
Impressive work, it's efficient and potent. Here's a suggestion.
The search is the critical component! It's the bottleneck for answering all queries, given you already possess a robust corpus.
C…
-
`RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cpu! `
several codes make ema model on gpu device:
```
if args.enable_ema:
em…
-
Select a series of models to be used in the project. They will be fine-tuned, architecturally manipulated (i.e., replacing the last layer for reward model), and RLHF will be performed on all models.
-
![image](https://user-images.githubusercontent.com/13724286/232206508-a702748c-3537-43fc-9755-e73ed1131fa6.png)
![image](https://user-images.githubusercontent.com/13724286/232206537-24ffaccd-fb5a-495…