-
## 🚀 Feature Request
Support DPO (Direct Preference Optimization) loss and data loader.
## Motivation
Many recent open LLMs have achieved promising results from using DPO instead of RL-style t…
-
Hello,
I am trying to do some basic inference with your sft and policy models.
However, when I instanciate the model directly with LlamaForCausalLM, the generation works well for the base pretrain…
-
[2023-08-12 01:22:11,409] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.0, git-hash=unknown, git-branch=unknown
Traceback (most recent call last):
File "/root/inpc_projects…
-
throw this errror
```
from chatllama.rlhf.trainer import RLTrainer
File "C:\Users\Admin\MY WORK\test llama\venv\lib\site-packages\chatllama\rlhf\trainer.py", line 12, in
from actor impor…
-
I am a freshman in RLHF, so will you upload a demo code of RLHF? or slide file of RLHF?
-
When I run the demo ( step3_rlhf_finetuning/training_scripts/opt/single_node/run_1.3b.sh) without any change , the reward dose not increase. Is it normal? I would appreciate it if anyone can provide …
-
### Describe the Question
Please provide a clear and concise description of what the question is.
大佬可以提供一个关于从预训练到SFT再到RLHF的各个阶段训练即推理的例子吗,把这几个串一下,比如预训练后,推理测试,感觉ok后,再进入SFT阶段,完后再推理测试,以此类推,这样有利于大家一起来讨论这…
-
With the proliferation of models and model variants it becomes more important to track assessment dates and model versions.
So far we've been able to treat model families as one, because it rarely …
-
@aicrumb built a really cool RLHF-trained Stable Diffusion prompter on BLOOM: https://huggingface.co/crumb/bloom-560m-RLHF-SD2-prompter
I believe it's possible to convert it to ONNX and then run it…
-
你好,下载repo后,在modeling_chatglm.py中没有看到RLHF和训练RM及PPO训练RL的代码。在read_me中明明说是和chatGPT一样的技术,支持RLHF的,请问是怎么情况啊。