When attempting to use gpt-4-turbo-preview with llms.OpenAI() I get the following error:
OpenAI (gpt-4-turbo-preview)] Retrying datadreamer.llms.openai.OpenAI.retry_wrapper.<locals>._retry_wrapper in 3.0 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': 'max_tokens is too large: 8153. This model supports at most 4096 completion tokens, whereas you provided 8153
This error can be fixed by passing max_new_tokens=4096. I'm attempting to fork datadreamer and fix get_max_content_length() in src/llms/openai.py, but I think there's general confusion between GPT-4's advertised "context length" and it's max number of completion/output tokens.
From src/llms/openai.py
def get_max_context_length(self, max_new_tokens: int) -> int: # pragma: no cover
"""Gets the maximum context length for the model. When ``max_new_tokens`` is
greater than 0, the maximum number of tokens that can be used for the prompt
context is returned.
Args:
max_new_tokens: The maximum number of tokens that can be generated.
Returns:
The maximum context length.
""" # pragma: no cover
model_name = _normalize_model_name(self.model_name)
format_tokens = 0
if _is_chat_model(model_name):
# Each message is up to 4 tokens and there are 3 messages
# (system prompt, user prompt, assistant response)
# and then we have to account for the system prompt
format_tokens = 4 * 3 + self.count_tokens(cast(str, self.system_prompt))
if "-preview" in model_name:
max_context_length = 128000
This code is obviously trying to calculate GPT-4's context length from the model name given. But the error produced later has to do with confusing context length with completion/output tokens and asking for more than the model will give (in this case, 4,096):
Just wanted to document the issue as I see it before I change the entire function around (in a PR) to reduce confusion.
When attempting to use
gpt-4-turbo-preview
withllms.OpenAI()
I get the following error:This error can be fixed by passing
max_new_tokens=4096
. I'm attempting to fork datadreamer and fixget_max_content_length()
insrc/llms/openai.py
, but I think there's general confusion between GPT-4's advertised "context length" and it's max number of completion/output tokens.From
src/llms/openai.py
This code is obviously trying to calculate GPT-4's context length from the model name given. But the error produced later has to do with confusing context length with completion/output tokens and asking for more than the model will give (in this case, 4,096):
Just wanted to document the issue as I see it before I change the entire function around (in a PR) to reduce confusion.