codelion / optillm

Optimizing inference proxy for LLMs
Apache License 2.0
1.69k stars 132 forks source link

Remove hard-coded User/Assistant prefixes from conversation parsing #112

Open ElioDonato opened 3 days ago

ElioDonato commented 3 days ago

The current implementation automatically prepends "User:" and "Assistant:" to messages when parsing conversations in parse_conversation(). This creates formatting issues when integrating with LLMs that have their own prompt templates and conversation formats.

In parse_conversation(), messages are formatted as:

if role == 'user':
    conversation.append(f"User: {text_content}")
elif role == 'assistant':
    conversation.append(f"Assistant: {text_content}")

This forces a specific conversation format regardless of the LLM's requirements or user's intended prompt template.

In my opinion the proxy should maintain message content as-is without adding role prefixes, allowing:

codelion commented 3 days ago

This is done only for multi-turn conversations, because several of the approaches that are implemented do a multi-turn conversation within. It doesn’t change the format of the messages as the response is always an open ai compatible messages object. The only thing that happens is that the initial multi-turn message is converted to a single turn message which is intentional to allow the implemented approaches to work.