Aligner2024 / aligner

Achieving Efficient Alignment through Learned Correction
https://aligner2024.github.io/
108 stars 5 forks source link

Warm-up训练 #7

Open YunYuFei opened 2 months ago

YunYuFei commented 2 months ago

请问Warm-up训练也是使用SFT做的全参微调吗?如果是的话,使用的训练超参数是否也与后续训练一致呢?

YunYuFei commented 2 months ago

尊敬的作者! 打扰您了,我在测试E-Dialogue和DialogSum数据集时也有一些疑问,想请教您是否对数据集做了什么特殊的处理?我目前的数据格式如下: E-Dialogue:

Eating food is hard. some guys shot my neighbour and ran into the woods One time I stayed at a friend's house for the weekend. When I got home I discovered that my brothers had destroyed my bedroom I just found a bunch of random toys in the catch of the sink of my kids bathroom. There going to have some explaining to do when they get home. my husband called me a punk today for nothing

DialogSum:

Please summerize the following dialogue: #Person1#: Hello, how are you doing today?\n#Person2#: I ' Ve been having trouble breathing lately.\n#Person1#: Have you had any type of cold lately?\n#Person2#: No, I haven ' t had a cold. I just have a heavy feeling in my chest when I try to breathe.\n#Person1#: Do you have any allergies that you know of?\n#Person2#: No, I don ' t have any allergies that I know of.\n#Person1#: Does this happen all the time or mostly when you are active?\n#Person2#: It happens a lot when I work out.\n#Person1#: I am going to send you to a pulmonary specialist who can run tests on you for asthma.\n#Person2#: Thank you for your help, doctor.

我直接将这些问题输入给上游LLM,生成原始答案,然后由Aligner进行纠正。但在复现过程中,我发现超过80%的情况下,答案几乎保持不变。若方便的话,能否请教一下您是如何处理这些数据的,或是有无可以参考的资料?同时,也想请问一下您采用的评估指标有哪些?

无论如何,非常感谢您的辛勤工作!

lxqpku commented 1 month ago

尊敬的作者! 打扰您了,我在测试E-Dialogue和DialogSum数据集时也有一些疑问,想请教您是否对数据集做了什么特殊的处理?我目前的数据格式如下: E-Dialogue:

Eating food is hard. some guys shot my neighbour and ran into the woods One time I stayed at a friend's house for the weekend. When I got home I discovered that my brothers had destroyed my bedroom I just found a bunch of random toys in the catch of the sink of my kids bathroom. There going to have some explaining to do when they get home. my husband called me a punk today for nothing

DialogSum:

Please summerize the following dialogue: #Person1#: Hello, how are you doing today?\n#Person2#: I ' Ve been having trouble breathing lately.\n#Person1#: Have you had any type of cold lately?\n#Person2#: No, I haven ' t had a cold. I just have a heavy feeling in my chest when I try to breathe.\n#Person1#: Do you have any allergies that you know of?\n#Person2#: No, I don ' t have any allergies that I know of.\n#Person1#: Does this happen all the time or mostly when you are active?\n#Person2#: It happens a lot when I work out.\n#Person1#: I am going to send you to a pulmonary specialist who can run tests on you for asthma.\n#Person2#: Thank you for your help, doctor.

我直接将这些问题输入给上游LLM,生成原始答案,然后由Aligner进行纠正。但在复现过程中,我发现超过80%的情况下,答案几乎保持不变。若方便的话,能否请教一下您是如何处理这些数据的,或是有无可以参考的资料?同时,也想请问一下您采用的评估指标有哪些?

无论如何,非常感谢您的辛勤工作!

Also want to know how to test aligner on the E-Dialogue