Failed to get max tokens for LLM with name XXX (Ollama provider)

danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

https://docs.danswer.dev/

Other

10.37k stars 1.25k forks source link

Failed to get max tokens for LLM with name XXX (Ollama provider) #1342

Open CosmicMac opened 5 months ago

CosmicMac commented 5 months ago

Hi, I'm facing the following issue when trying to chat with Ollama:

04/17/2024 01:13:07 PM             utils.py 273 : Failed to get max tokens for LLM with name gemma. Defaulting to 4096.
Traceback (most recent call last):
  File "/app/danswer/llm/utils.py", line 263, in get_llm_max_tokens
    model_obj = model_map[f"{model_provider}/{model_name}"]
                ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'ollama_chat/gemma'

Then Danswer answer :)

Error in input stream

My .env:

GEN_AI_MODEL_PROVIDER=ollama_chat
GEN_AI_MODEL_VERSION=gemma
GEN_AI_API_ENDPOINT=http://host.docker.internal:11434

QA_TIMEOUT=120
DISABLE_LLM_CHOOSE_SEARCH=True
DISABLE_LLM_CHUNK_FILTER=True
DISABLE_LLM_QUERY_REPHRASE=True
DISABLE_LLM_FILTER_EXTRACTION=True
# QA_PROMPT_OVERRIDE=weak

Ollama is up and running, tested from inside danswer-stack-api_server container with curl http://host.docker.internal:11434/api/tags (obviously I had to install curl first).

BTW, Danswer seems to replay the request once in case of error.

gargmukku07 commented 5 months ago

Hi,

Same error i am getting using the latest build. I am using the llama2.

gargmukku07 commented 5 months ago

This is fixed by setting the value of GEN_AI_MAX_TOKENS

al-lac commented 5 months ago

@gargmukku07 which value did you set for llama2?

LittleBennos commented 4 months ago

just wanted to add my findings

I was getting this error

05/28/2024 11:34:41 PM utils.py 328 : Failed to get max tokens for LLM with name azuregpt35turbo. Defaulting to 4096. Traceback (most recent call last): File "/app/danswer/llm/utils.py", line 318, in get_llm_max_tokens model_obj = model_map[model_name]


KeyError: 'azuregpt35turbo'
05/28/2024 11:34:46 PM            timing.py  74 : stream_chat_message took 7.445417404174805 seconds

turns out you need to set a variable for the GEN_AI_MAX_TOKENS 

this is due to this section of code in backend/danswer/llm/utils.py

) -> int:
    """Best effort attempt to get the max tokens for the LLM"""
    if GEN_AI_MAX_TOKENS:
        # This is an override, so always return this
        return GEN_AI_MAX_TOKENS

    try:
        model_obj = model_map.get(f"{model_provider}/{model_name}")
        if not model_obj:
            model_obj = model_map[model_name]

        if "max_input_tokens" in model_obj:
            return model_obj["max_input_tokens"]

        if "max_tokens" in model_obj:
            return model_obj["max_tokens"]

        raise RuntimeError("No max tokens found for LLM")
    except Exception:
        logger.exception(
            f"Failed to get max tokens for LLM with name {model_name}. Defaulting to 4096."
        )
        return 4096

Hurricane31337 commented 1 month ago

Every model has a different context size, therefore I propose that we add a context length option at each model (no matter the provider):

DanswerContextLength2