datadreamer-dev / DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤
https://datadreamer.dev
MIT License
806 stars 40 forks source link

gpt-4-turbo-preview max_tokens error #12

Closed zxkevn closed 6 months ago

zxkevn commented 6 months ago

When attempting to use gpt-4-turbo-preview with llms.OpenAI() I get the following error:

OpenAI (gpt-4-turbo-preview)] Retrying datadreamer.llms.openai.OpenAI.retry_wrapper.<locals>._retry_wrapper in 3.0 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': 'max_tokens is too large: 8153. This model supports at most 4096 completion tokens, whereas you provided 8153

This error can be fixed by passing max_new_tokens=4096. I'm attempting to fork datadreamer and fix get_max_content_length() in src/llms/openai.py, but I think there's general confusion between GPT-4's advertised "context length" and it's max number of completion/output tokens.

From src/llms/openai.py

    def get_max_context_length(self, max_new_tokens: int) -> int:  # pragma: no cover
        """Gets the maximum context length for the model. When ``max_new_tokens`` is
        greater than 0, the maximum number of tokens that can be used for the prompt
        context is returned.

        Args:
            max_new_tokens: The maximum number of tokens that can be generated.

        Returns:
            The maximum context length.
        """  # pragma: no cover
        model_name = _normalize_model_name(self.model_name)
        format_tokens = 0
        if _is_chat_model(model_name):
            # Each message is up to 4 tokens and there are 3 messages
            # (system prompt, user prompt, assistant response)
            # and then we have to account for the system prompt
            format_tokens = 4 * 3 + self.count_tokens(cast(str, self.system_prompt))
        if "-preview" in model_name:
            max_context_length = 128000

This code is obviously trying to calculate GPT-4's context length from the model name given. But the error produced later has to do with confusing context length with completion/output tokens and asking for more than the model will give (in this case, 4,096):

Screenshot 2024-03-07 at 2 49 41 PM

Just wanted to document the issue as I see it before I change the entire function around (in a PR) to reduce confusion.

AjayP13 commented 6 months ago

Thanks for reporting and looking into this @zeroshotkevin, this should now be fixed on on 0.24.0 that was just released on PyPI.