OpenAI should automatically switch to the model with more context

gventuri commented 10 months ago

🚀 The feature

Currently, when users encounter errors due to a context window that exceeds the model's capacity, they need to manually adjust their model choice to a larger version, which can be cumbersome and may disrupt the workflow. This feature would automate the process, ensuring that users get the best possible performance without the need for manual intervention.

Proposed Solution

The proposed solution is to implement an automatic context model switching mechanism that checks the context window size and switches to a larger model if needed. Here's how it could work:

When a user sends a request with a specific model (e.g., "gpt-4"), the system checks the context window size.
If the context window size exceeds the capacity of the selected model (e.g., "gpt-4"), the system automatically switches to the corresponding larger model (e.g., "gpt-4-32k").
The system processes the request using the larger model to ensure the context fits within its capacity.
The user receives the response seamlessly, without having to manually change the model selection.

Implementation Considerations

To implement this feature, OpenAI may need to develop a mechanism for context size detection and automatic model switching within the API infrastructure.

Example Scenario

A user sends a request to the API using "gpt-4" but accidentally provides a very long context that exceeds the model's capacity. Instead of receiving an error, the system automatically switches to "gpt-4-32k" to accommodate the larger context, and the user receives a timely and accurate response.

Motivation, pitch

Improved user experience: Users won't have to manually switch models when encountering context-related errors, making the interaction with OpenAI models more seamless.
Error prevention: Automatic context model switching can help prevent errors caused by users inadvertently exceeding the context window of their selected model.
Efficient use of resources: By automatically selecting the appropriate model based on context size, the system can make efficient use of computational resources.

Alternatives

No response

Additional context

This feature could be particularly useful for applications that involve dynamic and variable-length input contexts, such as chatbots, language translation, and content generation.

mspronesti commented 10 months ago

My two cents: as the user chooses which model to use according to their subscription and budget, I wouldn't go for an auto-switch.

Also, the named "auto-switch" doesn't scale up: what if we switch to a 32k model and the user also saturates that?

My proposals would be to just use tiktoken to count the tokens and, in case of context overflow, raise an expection to the user - without sending the request to OpenAI. Alternatively, we might adopt a sliding window approach: we warn the user about the context being saturated and we delete the first N pairs of user-assistanr messages until we make enough room.

What do you think? :)

gventuri commented 10 months ago

@mspronesti good catch! The problem is in some production envs, you might not want to use the bigger context window unless it's necessary.

Maybe we could have a setting (default False) that if True and the context window exceeds, it automatically scales to the bigger context window model (which is more expensive and shouldn't be used by default).

I also like the sliding window approach.

So to summarize:

we could implement a tiktoken count that raises an error if the context is too big (instead of sending the request)
if a "autoscale" param is set to True, instead of raising an error, it rather switches to the "biggest" model
we could add a sliding window that decreases the size of the conversation (and in theory even the number of samples for each df) until it fits the context window.

What do you think?

mspronesti commented 10 months ago

@gventuri I'm in principle not in favor of the autoscale parameter because that would be hard to use in Azure and will raise other maintenance problems in case of discontinued models. The sliding window sounds good to me :)

gventuri commented 10 months ago

@mspronesti it might be a model specific param, no need to support it in every model. What kind of issues do you envision using it with Azure?

mspronesti commented 10 months ago

Azure works with deployments, not directly with models. This leads to two source of problems:

A specific resource might not have a deployed version of a bigger model - or not even access to it.
To perform the auto-switch we need the name of the deployment

Therefore, to "scale-up" one would need to set the flag to True and to provide the name of their new deployment (and what if they have it in another resource for instance?).

In principle I don't like the idea of changing the model on the fly. I'd prefer to design a strategy instead (e.g. sliding window, summary of the conversation, etc.)

But again, these are just my 2 cents. Happy to contribute to whatever we decide :)

strangercacaus commented 8 months ago

I agree with @mspronesti that the risk of budged impact is significant with this strategy. Is there a way we can measure (or estimate) the rate of incidence of context size runouts?

Bertimaz commented 7 months ago

Hey I wanted to tackle this issue. But I am not sure what to implement based on the discussion. Should I implement just the sliding window?

gventuri commented 7 months ago

@Bertimaz the sliding window probably wouldn't work because of the key context (i.e. dataframes) being at the beginning. The solution could be, if the context is bigger than the one allowed by the selected model, to switch to a higher model, but I'm not entirely sure this should be the expected behavior, as it could result in a budget impact, as @mspronesti and @strangercacaus mentioned.

Sinaptik-AI / pandas-ai