rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/accelerate #3091

main_process_port: 0 in config for next free port not workin…

### System Info ```Shell `Accelerate` version: 0.29.3 - Python version: 3.11.4 ``` ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] One of the sc…

lorenzflow updated 3 weeks ago
1
microsoft/DeepSpeedExamples #529

Much more memory used in step 3 when using multi gpus compar…

**System Info:** Memory: 500G GPU: 8 * A100 80G Question: **Why using multi gpus in init of DeepSpeedRLHFEngine used much more memroy compared to using single gpu ?** **Reproduce:** Copy mode…

cokuehuang updated 7 months ago
5
microsoft/DeepSpeedExamples #311

when I am running RLHF script, I encountered a error

![image](https://user-images.githubusercontent.com/13724286/232206508-a702748c-3537-43fc-9755-e73ed1131fa6.png) ![image](https://user-images.githubusercontent.com/13724286/232206537-24ffaccd-fb5a-495…

liuzhiyong01 updated 1 year ago
3
nebuly-ai/optimate #245

[Chatllama] Training Reward Model on Human Preference Data

Hi! Is there a specific reason that we train the reward model based on absolute scores rather than pairwise human preferences on the same prompts, as most of the other rlhf work?

TonyZhanghm updated 1 year ago
5
microsoft/DeepSpeedExamples #528

Rewards in ppo seem to be recomputed many times

Thank you for the great work! The kl rewards seem to be computed each time calling train_rlhf(). [[code](https://github.com/microsoft/DeepSpeedExamples/blob/8f8099a813f3b223d5df39e0c15c748de4eb1669/a…

dwyzzy updated 1 year ago
2
microsoft/DeepSpeed #3465

[REQUEST] How can i know my step3 RLHF training is ok?

hello,i use deepspeed-chat training model,now,i start train step3,RLHF training,But i do not know how many epoch and my model parameters will be ok.One hours or longer?Can you give me some advice? …

liyuyuan6969 updated 1 year ago
1
eric-mitchell/direct-preference-optimization #47

No such file or directory: json-train-00000-00000-of-NNNNN.a…

[Errno 2] No such file or directory: '.cache/ec2-user/Anthropic___json/Anthropic--hh-rlhf-a9fdd36e8b50b8fa/0.0.0/bd2024624bf0cc9525bb882643bfedfb1437c404efd58d805d47af1dea815973/json-train-00000-00000…

qingerVT updated 11 months ago
1
Facico/Chinese-Vicuna #14

后续会考虑加入RLHF吗

yuxuan2015 updated 1 year ago
2
nebuly-ai/optimate #295

Add Support for PEFT fine-tuning

# Description Supports for PEFT for chatllama models and trainings # TODO - [x] Add PEFT to Enable Parameter efficient fine-tuning in actor, reward and critic models. - [ ] Check RLHF stabil…

PierpaoloSorbellini updated 1 year ago
1
TrevorAshby/CodeRLHF #4

Model List

Select a series of models to be used in the project. They will be fine-tuned, architecturally manipulated (i.e., replacing the last layer for reward model), and RLHF will be performed on all models.

TrevorAshby updated 11 months ago
2

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for rlhf

1000+ results
for rlhf