Open lyklly opened 1 day ago
Table 1 test The experimental setup for Mistral-7B-CPO-Harmful is to append the preference token "Harmlessness:5" before MT-Bench, HaluEval 2.0, and HackaPrompt, and then evaluate through GPT-4.
Figure 4 test
Figure 4(a) appends <Helpfulness: 5> before the instruction on MT-Bench, and tests the obtained response in three dimensions: Helpful, Honesty, and Harmlessness, using GPT-4.
Similarly, Figure 4(b) appends
Due to the difference in the appended preference tokens in Table 1 and Figure 4, the corresponding performance also varies accordingly.
ok,thanks, i get it. I feel sorry to ask such a easy question.
---Original--- From: "Yiju @.> Date: Tue, Nov 26, 2024 10:24 AM To: @.>; Cc: @.**@.>; Subject: Re: [OpenBMB/CPO] experiment difference in paper (Issue #6)
Table 1 test The experimental setup for Mistral-7B-CPO-Harmful is to append the preference token "Harmlessness:5" before MT-Bench, HaluEval 2.0, and HackaPrompt, and then evaluate through GPT-4.
Figure 4 test Figure 4(a) appends <Helpfulness: 5> before the instruction on MT-Bench, and tests the obtained response in three dimensions: Helpful, Honesty, and Harmlessness, using GPT-4. Similarly, Figure 4(b) appends Honesty:5 before the instruction on HaluEval2.0, and Figure 4(c) appends <Harmlessness: 5> before the instruction on HackaPrompt.
Due to the difference in the appended preference tokens in Table 1 and Figure 4, the corresponding performance also varies accordingly.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
what's the difference between the experiment chapter 3.3 Multi-Objective Alignment Evaluation and experiment chapter 4.1 Performance Trade-off Evaluation. notice that they both evaluate on MT-Bench, HaluEval 2.0, HackaPrompt, respectively, and they both use the preference token (e.g. Harmlessness:5) in the prompt, and they both evaluate by GPT-4. why the performance of CPO in figure 4 is different from it in Table 1
Table 1
figure 4