boson-ai / RPBench-Auto

An automated pipeline for evaluating LLMs for role-playing.
Apache License 2.0
125 stars 3 forks source link

Impact of Message Order and Role on LLM API Response Quality #7

Open yang-su2000 opened 3 weeks ago

yang-su2000 commented 3 weeks ago

Great work for opensourcing the benchmark!

In the code snippet provided in eval_models_pairwise(...), the message order sent to the LLM API is as follows:

I am concerned that many LLMs are trained to handle user input first and might be expecting the user message before the assistant's greeting. This order could affect the model's response or introduce unintended behavior based on my testing.

Particularly, I have observed that changing the system message to a user role and put the "background" description as the system message often results in improved response quality in some LLMs. Can you check if this message order makes more sense or if there is a recommended approach for structuring messages to ensure optimal model performance?

Appreciate any feedback and suggestion!

sxjscience commented 3 weeks ago

Typically, users set the greeting message to control the bot's response style. Specifying it as the first assistant message in the API request or including it in the system message both seem reasonable to me. Ideally, a well-trained model should provide similar responses for both prompting techniques.