rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

hpcaitech/ColossalAI #4475

Question about the output of reward model in RLHF？

Why reward model use `mean(values[:,:-1], dim=1)` as output？ ```python values = self.value_head(last_hidden_states)[:, :-1] value = values.mean(dim=1).squeeze(1) # ensure shape is (B) ``` http…

gauss-clb updated 10 months ago
7
CSClubIIITDM-org/bughunt-2024 #267

[Toon squad] python 12

b'Sf\xdd\\;\x85\x80\xf1\xec\xf1c\xce\xc2\xf6\xc3k\xc8\x00\xccGaP3t\x9b\xa0\xe9`}7{q\xab\x1e\xa6\x1d\xfavm\x0bNf\x00\x94\x17\x92\xe0B\xca\x91\x8bl7\x84_\r\xef\xba\xa1\x9fx\x87;\xabw\xb1w\x1c\x92\x8c\xd…

Nandhakishore23 updated 3 months ago
2
lyogavin/Anima #19

关于rlhf中source_max_length和target_max_len

在 [qlora_dpo.py](https://github.com/lyogavin/Anima/blob/dc691b2958f50a6d73a239b0e13c341ce6b2d60f/rlhf/qlora_dpo.py)中，看到对chosen 进行 `max_length=self.source_max_len` 的tokenize，对rejected进行`max_length=self…

jiahuanluo updated 12 months ago
3
microsoft/DeepSpeedExamples #628

[Bug] In step3, a runtime error will be thrown when inferenc…

Desciption: In DeepSpeed-Chat step3, a runtime error: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 will be thrown when inference_tp_size>1 and hybrid engin…

haolin-nju updated 8 months ago
6
microsoft/DeepSpeedExamples #435

gpt ppo training error

I trained the PPO model, use the gpt. I modified the option of model_name_or_path from opt to gpt2 I passed step 1 and step 2,But An error occurred in step 3.The error is as follows: ╭────────────…

lljjgg updated 8 months ago
6
microsoft/DeepSpeedExamples #403

Error after changing the model from opt to gpt

I trained the PPO model, use the gpt. I modified the option of model_name_or_path from opt to gpt2 I passed step 1 and step 2,But An error occurred in step 3.The error is as follows: ╭────────────…

lljjgg updated 8 months ago
2
microsoft/DeepSpeedExamples #539

The min_length setting force the model generate to max lengt…

https://github.com/microsoft/DeepSpeedExamples/blob/8f8099a813f3b223d5df39e0c15c748de4eb1669/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L76 When i try to reproduce bl…

TheEighthDay updated 1 year ago
1
microsoft/DeepSpeed #3672

[BUG] Multi-node failure with Step3 RLHF Training with GPTJ6…

**Describe the bug** I am not able to run the multi-node script for 6B actor and critic on 2 nodes of 8 V100 GPUs on Azure ML. I am running the following command: deepspeed --master_port 29501 ma…

hiteshis updated 1 year ago
3
pandas-dev/pandas #39435

ENH: support reading from several files for read_* functions

#### Is your feature request related to a problem? In general, the implementation of this idea should contribute to simplification of reading functions use and reduce the use of boilerplate code. …

anmyachev updated 4 months ago
1
HumanSignal/label-studio-ml-backend #233

I want to create a RLHF backend/frontend for labelling<=>tra…

If anyone has any lead on this please let me know. also anyone want to collaborate on this direction please let me know.

hemangjoshi37a updated 1 year ago
8

上一页 1...10 11 12 13 14 15 16...100 下一页

1000+ results for rlhf

1000+ results
for rlhf