-
Why reward model use `mean(values[:,:-1], dim=1)` as output?
```python
values = self.value_head(last_hidden_states)[:, :-1]
value = values.mean(dim=1).squeeze(1) # ensure shape is (B)
```
http…
-
b'Sf\xdd\\;\x85\x80\xf1\xec\xf1c\xce\xc2\xf6\xc3k\xc8\x00\xccGaP3t\x9b\xa0\xe9`}7{q\xab\x1e\xa6\x1d\xfavm\x0bNf\x00\x94\x17\x92\xe0B\xca\x91\x8bl7\x84_\r\xef\xba\xa1\x9fx\x87;\xabw\xb1w\x1c\x92\x8c\xd…
-
在 [qlora_dpo.py](https://github.com/lyogavin/Anima/blob/dc691b2958f50a6d73a239b0e13c341ce6b2d60f/rlhf/qlora_dpo.py)中,看到对chosen 进行 `max_length=self.source_max_len` 的tokenize,对rejected进行`max_length=self…
-
Desciption: In DeepSpeed-Chat step3, a runtime error: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 will be thrown when inference_tp_size>1 and hybrid engin…
-
I trained the PPO model, use the gpt. I modified the option of model_name_or_path from opt to gpt2 I passed step 1 and step 2,But An error occurred in step 3.The error is as follows:
╭────────────…
-
I trained the PPO model, use the gpt. I modified the option of model_name_or_path from opt to gpt2 I passed step 1 and step 2,But An error occurred in step 3.The error is as follows:
╭────────────…
-
https://github.com/microsoft/DeepSpeedExamples/blob/8f8099a813f3b223d5df39e0c15c748de4eb1669/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L76
When i try to reproduce bl…
-
**Describe the bug**
I am not able to run the multi-node script for 6B actor and critic on 2 nodes of 8 V100 GPUs on Azure ML. I am running the following command:
deepspeed --master_port 29501 ma…
-
#### Is your feature request related to a problem?
In general, the implementation of this idea should contribute to simplification of reading functions use and reduce the use of boilerplate code.
…
-
If anyone has any lead on this please let me know. also anyone want to collaborate on this direction please let me know.