rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/torchtune #1793

Adding a KTO Optimizer

Due to recent changes in the space we probably need to take a look at adding this similarly to our DPO recipes, Happy to take a stab at it we are in agreement.

bjb19 updated 2 weeks ago
14
vlf-silkie/VLFeedback #9

DPO performance on other models

Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.

thusharakart updated 5 months ago
7
hpcaitech/ColossalAI #3704

[BUG]: Cannot run Stage-3 for a 6.7B parameter model

### 🐛 Describe the bug I am following this blog https://medium.com/pytorch/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b to train a 6.7B paramete…

Anuj040 updated 1 year ago
2
microsoft/DeepSpeed #3267

[REQUEST] RRHF

**Is your feature request related to a problem? Please describe.** We have posted a paper with codes [RRHF] (https://github.com/GanjinZero/RRHF) that can achieve human alignment without RLHF. RRHF ne…

GanjinZero updated 1 year ago
1
lucidrains/PaLM-rlhf-pytorch #23

✨ 😅 Is possibale to use the ChatGPT of OpenAI to train this…

OpenAI used **40 people** when training their own chatGPT, and the annotation process lasted for **3 months**. It is difficult for our open source community （github） to reproduce the **Reinforcemen…

Yonv1943 updated 12 months ago
8
AkihikoWatanabe/paper_notes #1296

Training language models to follow instructions with human f…

# URL - https://arxiv.org/abs/2203.02155 # Affiliations - Long Ouyang, N/A - Jeff Wu, N/A - Xu Jiang, N/A - Diogo Almeida, N/A - Carroll L. Wainwright, N/A - Pamela Mishkin, N/A - Chong …

AkihikoWatanabe updated 6 months ago
1
microsoft/DeepSpeedExamples #540

When to support the llama model?

SunQiDong1999 updated 1 year ago
1
surcyf123/dataset_enrichment #33

updated endpoints

new endpoints: urls = [ "http://172.218.204.83:2701", "http://37.27.2.44:60102", "http://184.67.78.114:42098", ] I also won't be normalizing the scores within the reward e…

surcyf123 updated 1 year ago
4
InternLM/xtuner #344

DPO dataset format and loss

Should be quite easy to add for someone who knows the codebase. The biggest problem might be a new dataset format. Don't expect I need to link this but it's pretty nice implementation of the loss: …

samedii updated 9 months ago
1
THUDM/GLM-130B #186

模型效果很差，是什么原因呢？

安装量化后的 int4 版本，测试了几句对话，感觉效果很差啊。猜测了一下原因：\ - 没有对话数据做微调 - 没有经过 RLHF 训练有没有人交流一下。

rchanggogogo updated 1 year ago
6

上一页 1...20 21 22 23 24 25 26...100 下一页

1000+ results for rlhf

1000+ results
for rlhf