-
I trained the PPO model, use the gpt. I modified the option of model_name_or_path from opt to gpt2 I passed step 1 and step 2,But An error occurred in step 3.The error is as follows:
╭────────────…
-
**Describe the bug**
I am not able to run the multi-node script for 6B actor and critic on 2 nodes of 8 V100 GPUs on Azure ML. I am running the following command:
deepspeed --master_port 29501 ma…
-
If anyone has any lead on this please let me know. also anyone want to collaborate on this direction please let me know.
-
trl/trlx: Transformerに基づいたLLMををRLHFできるライブラリ
https://github.com/CarperAI/trlx
-
# Description
Currently we are supporting the following datasets:
- [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP)
- [Anthropic RLHF](https://huggingf…
-
您好,我最近在用visualglm做reward model的训练,在修改和查看代码的时候发现modeling_chatglm.py里有一行:torch_image = torch_image.to(self.dtype).to(self.device),请问这个self.dtype具体是指?我在代码里没有找到相关的定义
-
Last meeting #3321
* spam, bots, and data quality for inference and RLHF
* found this old issue #914
-
root@b787722dc2e1:/workspace/workfile/Projects/chatllama# python artifacts/main.py artifacts/config/config.yaml --type ACTOR
Current device used :cuda
local_rank: -1 world_size: -1
Traceback (most …
-
any chance you could implement this?
https://github.com/vinhkhuc/ddpo/tree/support_gpu
it's for RLHF type of stuff, [check the paper](https://rl-diffusion.github.io/)
could be really interesting fo…
-
#### Is your feature request related to a problem?
In general, the implementation of this idea should contribute to simplification of reading functions use and reduce the use of boilerplate code.
…