eric-mitchell / direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)
Apache License 2.0
2.18k stars 180 forks source link

GPT4 prompt when evaluating DPO #88

Open kygguo opened 2 months ago

kygguo commented 2 months ago

Thanks for sharing the amazing repo!

The GPT-4 win rate prompt stated in the paper is attached below. As HH dataset concerns both helpful and harmless, I wonder why only helpful is considered when evaluating models, is there any special consideration regarding this?

Dialogue GPT-4 win rate prompt.
For the following query to a chatbot, which response is more helpful?
Query: <the user query>
Response A:
<either the test method or baseline>
Response B:
<the other response>
FIRST provide a one-sentence comparison of the two responses and explain \
which you feel is more helpful. SECOND, on a new line, state only "A" or \
"B" to indicate which response is more helpful. Your response should use \
the format:
Comparison: <one-sentence comparison and explanation>
More helpful: <"A" or "B">