We should limit the length of content send to OpenAI model based on the supported token limit.

What language are you using?

Dotnet (OOP)

Expected Behavior

Can perform a chat session without receiving a token length limit. I would expect the binding to trim the content so that it stays under the supported limit provided by the LLM deployment model.

Actual Behavior

Exception while executing function: Functions.chatQuery This model's maximum context length is 4096 tokens. However, your messages resulted in 4109 tokens (4046 in the messages, 63 in the functions). Please reduce the length of the messages or functions. Status: 400 (model_error) ErrorCode: context_length_exceeded

Content: { "error": { "message": "This model's maximum context length is 4096 tokens. However, your messages resulted in 4109 tokens (4046 in the messages, 63 in the functions). Please reduce the length of the messages or functions.", "type": "invalid_request_error", "param": "messages", "code": "context_length_exceeded" } }

Host.json

No response

Steps to Reproduce

Create a long chat session. At some point, it will return with the token limit above.

It appears that we try and retrieve all the chat history and send this to OpenAI for conversation context. We need to limit the amount of chat history we send so we stay below the content limit.

Relevant code being tried

No response

Relevant log output

No response

Where are you facing this problem?

Local - Core Tools

Additional Information

No response

Azure / azure-functions-openai-extension