continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
18.26k stars 1.47k forks source link

Compatibility: LLama Message Order #488

Closed adrianliechti closed 1 year ago

adrianliechti commented 1 year ago

LLama only supports messages in this order: 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and alternating (u/a/u/a/u...)

continue.dev sends two user role messages in a row (one with the text, one with the code snippet). could you consider to merge these client-side? or is there already an option to merge them?

sestinj commented 1 year ago

There's no option as of now, but I've been thinking about finally merging into a single user prompt.

Curious: does this seem to be the cause of low-quality responses?

adrianliechti commented 1 year ago

as far as i understand the fine-tuning of a model is configured for a specific flow. I could imagine that ignoring that could end up in suboptimal results.

in the case of llama, they have a check in their implementation: https://github.com/facebookresearch/codellama/blob/d2b38acd3a9c55051de1f21d9132f61de7d1a630/llama/generation.py#L309

sestinj commented 1 year ago

@adrianliechti I've played around with this formatting and found that there doesn't seem to be any downgrade in capabilities from using the different ordering. But I wanted to check with you as well before closing this issue - have you noticed any poor responses coming from Continue?

adrianliechti commented 1 year ago

feel free to close this!! and thanks a ton

I also implemented a "flattering" of the order in my proxy by combining the role's messages: https://github.com/adrianliechti/llama/blob/main/llama-openai/provider/llama/llama.go#L271

sestinj commented 1 year ago

cool! Did you notice a quality difference after doing this?

sestinj commented 1 year ago

ok going to close this because (really funny timing) someone just brought up that pplx-api enforces the chat message ordering! So made the change inspired by your flattening function, and going to keep it in pre-release for a bit