-
Due to recent changes in the space we probably need to take a look at adding this similarly to our DPO recipes, Happy to take a stab at it we are in agreement.
bjb19 updated
2 weeks ago
-
Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.
-
### 🐛 Describe the bug
I am following this blog https://medium.com/pytorch/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b to train a 6.7B paramete…
-
**Is your feature request related to a problem? Please describe.**
We have posted a paper with codes [RRHF] (https://github.com/GanjinZero/RRHF) that can achieve human alignment without RLHF. RRHF ne…
-
OpenAI used **40 people** when training their own chatGPT, and the annotation process lasted for **3 months**.
It is difficult for our open source community (github) to reproduce the **Reinforcemen…
-
# URL
- https://arxiv.org/abs/2203.02155
# Affiliations
- Long Ouyang, N/A
- Jeff Wu, N/A
- Xu Jiang, N/A
- Diogo Almeida, N/A
- Carroll L. Wainwright, N/A
- Pamela Mishkin, N/A
- Chong …
-
-
new endpoints:
urls = [
"http://172.218.204.83:2701",
"http://37.27.2.44:60102",
"http://184.67.78.114:42098",
]
I also won't be normalizing the scores within the reward e…
-
Should be quite easy to add for someone who knows the codebase. The biggest problem might be a new dataset format.
Don't expect I need to link this but it's pretty nice implementation of the loss:
…
-
安装量化后的 int4 版本,测试了几句对话,感觉效果很差啊。
猜测了一下原因:\
- 没有对话数据做微调
- 没有经过 RLHF 训练
有没有人交流一下。