rlhf Search Results - Githubissues

HumanSignal/RLHF #6

How to fix the following errors?

The following error occurred while running cell 10 in **6. Tune language model using PPO with our preference model**. After adding `__init__.py` to `/content/trlx/examples/summarize_rlhf/reward_model…

missflash updated 1 month ago

Stability-AI/StableLM #69

RLHF training code for StableVicuna open sourced?

Very exciting to see you guys' remarkable work on stablevicuna!! And I read through your blog and notice that all the dataset is open sourced and available; however, considering the training code pa…

REIGN12 updated 1 year ago

GanjinZero/RRHF #31

期待LoRA或ptuning

只有24G显存，可以用RLHF微调么...

Noyce765103 updated 1 year ago

microsoft/DeepSpeedExamples #796

Question about loading Dahous dataset from local path.

I have put the `Dahous/rm-static` dataset as well as the the model `facebook/opt-1.3b` under the path **DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning** When r…

Zhutianyi7230 updated 2 days ago

OpenLMLab/MOSS-RLHF #38

自有的底座模型，自有的SFT权重，重新训练RM，可行么

作者大佬您好，感谢您的贡献和输出，因为我对RLHF的这部分比较陌生，所以想咨询您几个问题，希望得到您的指点： 1. 如果我底座模型是其他的模型，比如：Baichuan2，或者ChatGLM2，然后SFT的时候是自定义的训练数据，这种模式是可以使用你们的发布的RLHF的代码么 2. 如果1可以的话，那么意味着我需要重新训练RM，然后PPO，我想了解这种场景，你们当前的代码是否可以支持 3. 如…

camposs1979 updated 5 months ago

microsoft/DeepSpeed #3331

[REQUEST] how to RLHF-fine-tune custom data for deepspeedCha…

hi,dear any doc for the data dir and the model to load and infer ? down is Chinese 大佬，微调的数据存放在哪啊？能改成自己的数据么？另外：最后微调好的模型在哪啊？怎么加载和推理啊？

ucas010 updated 1 year ago

OpenLMLab/MOSS-RLHF #39

请问目前支持基座模型使用Mistral-7b吗

作者您好，我想请问一下目前的moss-rlhf代码支持基座模型使用mistral-7b训练出来的，不是llama系列的模型吗。非常感谢您的贡献和百忙中的回复。

YijuGuo updated 5 months ago

jochenvw/llm-playground #6

Logs to KQL reqs

## Library and prompts layout - [x] Implenent Promptflow as orchestrator - Semantic Kernel as executor - see https://learn.microsoft.com/en-us/semantic-kernel/agents/planners/evaluate-and-deploy-pl…

jochenvw updated 2 months ago

vlf-silkie/VLFeedback #9

DPO performance on other models

Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.

thusharakart updated 4 weeks ago

nebuly-ai/nebuly #298

[Chatllama] KL Divergence equation

Hello, I have a quick question. I know most RLHF structure use KL divergence. https://github.com/nebuly-ai/nebullvm/blob/aad1c09ce20946294df3ec83569bad9496f58d0e/apps/accelerate/chatllama/chatllam…

mountinyy updated 1 year ago

1000+ results for rlhf

1000+ results
for rlhf