RM训练过程中数据集的制作

wuQi-666 commented 12 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

作者你好，非常感谢你的教程，有点不明白向你请教，RM模型在训练的过程中使用的数据集请问是怎么制作的？查资料说需要人工标注，能详细描述一下吗？谢谢

Expected Behavior

期待您的回复

Steps To Reproduce

谢谢

Environment

- OS:Linux
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Pillars-Creation commented 12 months ago

有人工标注的数据当然是最好。没有人工标注的时候，用一些规则也可以达到我们想要的结果，

比如我的demo里是想让模型从指定的上下文里找到符合promot里分类的文章，那么我们就可以从lora生成结果中挑选出符合这些要求和不符合要求的文章，作为正负样本。方法如下：

rm样本制作第一种正例：选择一条在prompt中符合条件的新闻为正例负例：随机选择一条不在prompt中的新闻作为负例，

第二种，正例：sft一次预测多条，从预测的结果中，挑选出符合条件的为正负例：sft一次预测多条，从预测的结果中，挑选出不符合条件的为负

希望对你有所启发

------------------ 原始邮件 ------------------ 发件人: "Pillars-Creation/ChatGLM-RLHF-LoRA-RM-PPO" @.>; 发送时间: 2023年10月30日(星期一) 晚上8:33 @.>; @.***>; 主题: [Pillars-Creation/ChatGLM-RLHF-LoRA-RM-PPO] RM训练过程中数据集的制作 (Issue #3)

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

作者你好，非常感谢你的教程，有点不明白向你请教，RM模型在训练的过程中使用的数据集请问是怎么制作的？查资料说需要人工标注，能详细描述一下吗？谢谢

Expected Behavior

期待您的回复

Steps To Reproduce

谢谢

Environment

OS:Linux - Python: - Transformers: - PyTorch: - CUDA Support (python -c "import torch; print(torch.cuda.is_available())") :

Anything else?

No response

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

wuQi-666 commented 12 months ago

非常感谢

Pillars-Creation / ChatGLM-RLHF-LoRA-RM-PPO