Open CosmicMac opened 5 months ago
Hi,
Same error i am getting using the latest build. I am using the llama2.
This is fixed by setting the value of GEN_AI_MAX_TOKENS
@gargmukku07 which value did you set for llama2?
just wanted to add my findings
I was getting this error
05/28/2024 11:34:41 PM utils.py 328 : Failed to get max tokens for LLM with name azuregpt35turbo. Defaulting to 4096. Traceback (most recent call last): File "/app/danswer/llm/utils.py", line 318, in get_llm_max_tokens model_obj = model_map[model_name]
KeyError: 'azuregpt35turbo'
05/28/2024 11:34:46 PM timing.py 74 : stream_chat_message took 7.445417404174805 seconds
turns out you need to set a variable for the GEN_AI_MAX_TOKENS
this is due to this section of code in backend/danswer/llm/utils.py
) -> int:
"""Best effort attempt to get the max tokens for the LLM"""
if GEN_AI_MAX_TOKENS:
# This is an override, so always return this
return GEN_AI_MAX_TOKENS
try:
model_obj = model_map.get(f"{model_provider}/{model_name}")
if not model_obj:
model_obj = model_map[model_name]
if "max_input_tokens" in model_obj:
return model_obj["max_input_tokens"]
if "max_tokens" in model_obj:
return model_obj["max_tokens"]
raise RuntimeError("No max tokens found for LLM")
except Exception:
logger.exception(
f"Failed to get max tokens for LLM with name {model_name}. Defaulting to 4096."
)
return 4096
Every model has a different context size, therefore I propose that we add a context length option at each model (no matter the provider):
Hi, I'm facing the following issue when trying to chat with Ollama:
Then Danswer answer :)
My
.env
:Ollama is up and running, tested from inside
danswer-stack-api_server
container withcurl http://host.docker.internal:11434/api/tags
(obviously I had to install curl first).BTW, Danswer seems to replay the request once in case of error.