-
The following error occurred while running cell 10 in **6. Tune language model using PPO with our preference model**.
After adding `__init__.py` to `/content/trlx/examples/summarize_rlhf/reward_model…
-
Very exciting to see you guys' remarkable work on stablevicuna!!
And I read through your blog and notice that all the dataset is open sourced and available; however, considering the training code pa…
-
只有24G显存,可以用RLHF微调么...
-
I have put the `Dahous/rm-static` dataset as well as the the model `facebook/opt-1.3b` under the path
**DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning**
When r…
-
作者大佬您好,感谢您的贡献和输出,因为我对RLHF的这部分比较陌生,所以想咨询您几个问题,希望得到您的指点:
1. 如果我底座模型是其他的模型,比如:Baichuan2,或者ChatGLM2,然后SFT的时候是自定义的训练数据,这种模式是可以使用你们的发布的RLHF的代码么
2. 如果1可以的话,那么意味着我需要重新训练RM,然后PPO,我想了解这种场景,你们当前的代码是否可以支持
3. 如…
-
hi,dear
any doc for the data dir and the model to load and infer ?
down is Chinese
大佬,微调的数据存放在哪啊?能改成自己的数据么?
另外:最后微调好的模型在哪啊?怎么加载和推理啊?
-
作者您好,我想请问一下目前的moss-rlhf代码支持基座模型使用mistral-7b训练出来的,不是llama系列的模型吗。非常感谢您的贡献和百忙中的回复。
-
## Library and prompts layout
- [x] Implenent Promptflow as orchestrator - Semantic Kernel as executor - see https://learn.microsoft.com/en-us/semantic-kernel/agents/planners/evaluate-and-deploy-pl…
-
Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.
-
Hello, I have a quick question.
I know most RLHF structure use KL divergence.
https://github.com/nebuly-ai/nebullvm/blob/aad1c09ce20946294df3ec83569bad9496f58d0e/apps/accelerate/chatllama/chatllam…