rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

jaymody/gpt-jax #1

About ChatGPT or InstructGPT

Hi, very nice repo. May I ask do you plan to reproduce ChatGPT/InstructGPT or GPT with RLHF based on JAX? Best

sglucas updated 1 year ago
1
microsoft/DeepSpeedExamples #796

Question about loading Dahous dataset from local path.

I have put the `Dahous/rm-static` dataset as well as the the model `facebook/opt-1.3b` under the path **DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning** When r…

Zhutianyi7230 updated 4 days ago
8
hkproj/rlhf-ppo #1

question: how is the gradient of the log probs calculated?

hi Umar, What an awesome free lecture and I cannot thank you enough for your service to all of us developers! Sorry that I have to borrow this place for a question. In slides "RLHF and PPO" page 17…

letitfly updated 3 months ago
1
eric-mitchell/direct-preference-optimization #47

No such file or directory: json-train-00000-00000-of-NNNNN.a…

[Errno 2] No such file or directory: '.cache/ec2-user/Anthropic___json/Anthropic--hh-rlhf-a9fdd36e8b50b8fa/0.0.0/bd2024624bf0cc9525bb882643bfedfb1437c404efd58d805d47af1dea815973/json-train-00000-00000…

qingerVT updated 8 months ago
1
vlf-silkie/VLFeedback #9

DPO performance on other models

Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.

thusharakart updated 1 month ago
7
microsoft/DeepSpeedExamples #528

Rewards in ppo seem to be recomputed many times

Thank you for the great work! The kl rewards seem to be computed each time calling train_rlhf(). [[code](https://github.com/microsoft/DeepSpeedExamples/blob/8f8099a813f3b223d5df39e0c15c748de4eb1669/a…

dwyzzy updated 1 year ago
2
StampyAI/stampy-chat #138

better search

Impressive work, it's efficient and potent. Here's a suggestion. The search is the critical component! It's the bottleneck for answering all queries, given you already possess a robust corpus. C…

wassname updated 4 months ago
3
OpenLLMAI/OpenRLHF #245

enable_ema cause runtime error when running train_ppo_llama.…

`RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cpu! ` several codes make ema model on gpu device: ``` if args.enable_ema: em…

dshnightmare updated 1 month ago
6
TrevorAshby/CodeRLHF #4

Model List

Select a series of models to be used in the project. They will be fine-tuned, architecturally manipulated (i.e., replacing the last layer for reward model), and RLHF will be performed on all models.

TrevorAshby updated 8 months ago
2
microsoft/DeepSpeedExamples #311

when I am running RLHF script, I encountered a error

![image](https://user-images.githubusercontent.com/13724286/232206508-a702748c-3537-43fc-9755-e73ed1131fa6.png) ![image](https://user-images.githubusercontent.com/13724286/232206537-24ffaccd-fb5a-495…

liuzhiyong01 updated 1 year ago
3

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for rlhf

1000+ results
for rlhf