Can perform a chat session without receiving a token length limit. I would expect the binding to trim the content so that it stays under the supported limit provided by the LLM deployment model.
Actual Behavior
Exception while executing function: Functions.chatQuery This model's maximum context length is 4096 tokens. However, your messages resulted in 4109 tokens (4046 in the messages, 63 in the functions). Please reduce the length of the messages or functions.
Status: 400 (model_error)
ErrorCode: context_length_exceeded
Content:
{
"error": {
"message": "This model's maximum context length is 4096 tokens. However, your messages resulted in 4109 tokens (4046 in the messages, 63 in the functions). Please reduce the length of the messages or functions.",
"type": "invalid_request_error",
"param": "messages",
"code": "context_length_exceeded"
}
}
Host.json
No response
Steps to Reproduce
Create a long chat session.
At some point, it will return with the token limit above.
It appears that we try and retrieve all the chat history and send this to OpenAI for conversation context. We need to limit the amount of chat history we send so we stay below the content limit.
What language are you using?
Dotnet (OOP)
Expected Behavior
Can perform a chat session without receiving a token length limit. I would expect the binding to trim the content so that it stays under the supported limit provided by the LLM deployment model.
Actual Behavior
Exception while executing function: Functions.chatQuery This model's maximum context length is 4096 tokens. However, your messages resulted in 4109 tokens (4046 in the messages, 63 in the functions). Please reduce the length of the messages or functions. Status: 400 (model_error) ErrorCode: context_length_exceeded
Content: { "error": { "message": "This model's maximum context length is 4096 tokens. However, your messages resulted in 4109 tokens (4046 in the messages, 63 in the functions). Please reduce the length of the messages or functions.", "type": "invalid_request_error", "param": "messages", "code": "context_length_exceeded" } }
Host.json
No response
Steps to Reproduce
Create a long chat session. At some point, it will return with the token limit above.
It appears that we try and retrieve all the chat history and send this to OpenAI for conversation context. We need to limit the amount of chat history we send so we stay below the content limit.
Relevant code being tried
No response
Relevant log output
No response
Where are you facing this problem?
Local - Core Tools
Additional Information
No response