danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://docs.danswer.dev/
Other
10.18k stars 1.2k forks source link

How to use Huggingface Inference? #959

Open AndersGiovanni opened 7 months ago

AndersGiovanni commented 7 months ago

I'm trying to replace the use of OpenAI with Llama from Huggingface with Inference Pro (I follow the instructions specified in https://docs.danswer.dev/gen_ai_configs/huggingface). The API key is specified in the .env. However when trying to chat, I get the following error:

image

Whenever I change the model to OpenAI and provide an API keys from them, everything works like a charm.

Any suggestions on how to fix this?

This is the log from the api-server when using huggingface.

2024-01-17 21:52:01 01/17/2024 08:52:01 PM      chat_backend.py 170 : Received new chat message: Hello
2024-01-17 21:52:01 INFO:     172.18.0.7:51974 - "POST /chat/send-message HTTP/1.1" 200 OK
2024-01-17 21:52:01 01/17/2024 08:52:01 PM   process_message.py 432 : HuggingfaceException - Traceback (most recent call last):
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 423, in completion
2024-01-17 21:52:01     return self.convert_to_model_response_object(
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 173, in convert_to_model_response_object
2024-01-17 21:52:01     if len(completion_response[0]["generated_text"]) > 0:
2024-01-17 21:52:01           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01 KeyError: 'generated_text'
2024-01-17 21:52:01 Traceback (most recent call last):
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 423, in completion
2024-01-17 21:52:01     return self.convert_to_model_response_object(
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 173, in convert_to_model_response_object
2024-01-17 21:52:01     if len(completion_response[0]["generated_text"]) > 0:
2024-01-17 21:52:01           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01 KeyError: 'generated_text'
2024-01-17 21:52:01 
2024-01-17 21:52:01 During handling of the above exception, another exception occurred:
2024-01-17 21:52:01 
2024-01-17 21:52:01 Traceback (most recent call last):
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 889, in completion
2024-01-17 21:52:01     model_response = huggingface.completion(
2024-01-17 21:52:01                      ^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 440, in completion
2024-01-17 21:52:01     raise HuggingfaceError(status_code=500, message=traceback.format_exc())
2024-01-17 21:52:01 litellm.llms.huggingface_restapi.HuggingfaceError: Traceback (most recent call last):
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 423, in completion
2024-01-17 21:52:01     return self.convert_to_model_response_object(
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 173, in convert_to_model_response_object
2024-01-17 21:52:01     if len(completion_response[0]["generated_text"]) > 0:
2024-01-17 21:52:01           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01 KeyError: 'generated_text'
2024-01-17 21:52:01 
2024-01-17 21:52:01 
2024-01-17 21:52:01 During handling of the above exception, another exception occurred:
2024-01-17 21:52:01 
2024-01-17 21:52:01 Traceback (most recent call last):
2024-01-17 21:52:01   File "/app/danswer/chat/process_message.py", line 242, in stream_chat_message
2024-01-17 21:52:01     run_search = check_if_need_search(
2024-01-17 21:52:01                  ^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/app/danswer/secondary_llm_flows/choose_search.py", line 87, in check_if_need_search
2024-01-17 21:52:01     require_search_output = llm.invoke(filled_llm_prompt)
2024-01-17 21:52:01                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/app/danswer/llm/chat_llm.py", line 56, in invoke
2024-01-17 21:52:01     model_raw = self.llm.invoke(prompt).content
2024-01-17 21:52:01                 ^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 142, in invoke
2024-01-17 21:52:01     self.generate_prompt(
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 459, in generate_prompt
2024-01-17 21:52:01     return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 349, in generate
2024-01-17 21:52:01     raise e
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 339, in generate
2024-01-17 21:52:01     self._generate_with_cache(
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 492, in _generate_with_cache
2024-01-17 21:52:01     return self._generate(
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/litellm.py", line 306, in _generate
2024-01-17 21:52:01     response = self.completion_with_retry(
2024-01-17 21:52:01                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/litellm.py", line 239, in completion_with_retry
2024-01-17 21:52:01     return _completion_with_retry(**kwargs)
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 289, in wrapped_f
2024-01-17 21:52:01     return self(f, *args, **kw)
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 379, in __call__
2024-01-17 21:52:01     do = self.iter(retry_state=retry_state)
2024-01-17 21:52:01          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 325, in iter
2024-01-17 21:52:01     raise retry_exc.reraise()
2024-01-17 21:52:01           ^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 158, in reraise
2024-01-17 21:52:01     raise self.last_attempt.result()
2024-01-17 21:52:01           ^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
2024-01-17 21:52:01     return self.__get_result()
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
2024-01-17 21:52:01     raise self._exception
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 382, in __call__
2024-01-17 21:52:01     result = fn(*args, **kwargs)
2024-01-17 21:52:01              ^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/litellm.py", line 237, in _completion_with_retry
2024-01-17 21:52:01     return self.client.completion(**kwargs)
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 1358, in wrapper
2024-01-17 21:52:01     raise e
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 1289, in wrapper
2024-01-17 21:52:01     result = original_function(*args, **kwargs)
2024-01-17 21:52:01              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 1392, in completion
2024-01-17 21:52:01     raise exception_type(
2024-01-17 21:52:01           ^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 4545, in exception_type
2024-01-17 21:52:01     raise e
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 4118, in exception_type
2024-01-17 21:52:01     raise APIError(
2024-01-17 21:52:01 litellm.exceptions.APIError: HuggingfaceException - Traceback (most recent call last):
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 423, in completion
2024-01-17 21:52:01     return self.convert_to_model_response_object(
2024-01-17 21:52:01            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 173, in convert_to_model_response_object
2024-01-17 21:52:01     if len(completion_response[0]["generated_text"]) > 0:
2024-01-17 21:52:01           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:01 KeyError: 'generated_text'
2024-01-17 21:52:01 
2024-01-17 21:52:01 01/17/2024 08:52:01 PM            timing.py  63 : stream_chat_message took 0.44702649116516113 seconds
2024-01-17 21:52:01 01/17/2024 08:52:01 PM      chat_backend.py 129 : Received rename request for chat session: 11
2024-01-17 21:52:02 
2024-01-17 21:52:02 Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
2024-01-17 21:52:02 LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
2024-01-17 21:52:02 
2024-01-17 21:52:02 
2024-01-17 21:52:02 Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
2024-01-17 21:52:02 LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
2024-01-17 21:52:02 
2024-01-17 21:52:02 INFO:     172.18.0.7:51976 - "PUT /chat/rename-chat-session HTTP/1.1" 500 Internal Server Error
2024-01-17 21:52:02 ERROR:    Exception in ASGI application
2024-01-17 21:52:02 Traceback (most recent call last):
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 423, in completion
2024-01-17 21:52:02     return self.convert_to_model_response_object(
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 173, in convert_to_model_response_object
2024-01-17 21:52:02     if len(completion_response[0]["generated_text"]) > 0:
2024-01-17 21:52:02           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02 KeyError: 'generated_text'
2024-01-17 21:52:02 
2024-01-17 21:52:02 During handling of the above exception, another exception occurred:
2024-01-17 21:52:02 
2024-01-17 21:52:02 Traceback (most recent call last):
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 889, in completion
2024-01-17 21:52:02     model_response = huggingface.completion(
2024-01-17 21:52:02                      ^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 440, in completion
2024-01-17 21:52:02     raise HuggingfaceError(status_code=500, message=traceback.format_exc())
2024-01-17 21:52:02 litellm.llms.huggingface_restapi.HuggingfaceError: Traceback (most recent call last):
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 423, in completion
2024-01-17 21:52:02     return self.convert_to_model_response_object(
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 173, in convert_to_model_response_object
2024-01-17 21:52:02     if len(completion_response[0]["generated_text"]) > 0:
2024-01-17 21:52:02           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02 KeyError: 'generated_text'
2024-01-17 21:52:02 
2024-01-17 21:52:02 
2024-01-17 21:52:02 During handling of the above exception, another exception occurred:
2024-01-17 21:52:02 
2024-01-17 21:52:02 Traceback (most recent call last):
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
2024-01-17 21:52:02     result = await app(  # type: ignore[func-returns-value]
2024-01-17 21:52:02              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
2024-01-17 21:52:02     return await self.app(scope, receive, send)
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 292, in __call__
2024-01-17 21:52:02     await super().__call__(scope, receive, send)
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
2024-01-17 21:52:02     await self.middleware_stack(scope, receive, send)
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
2024-01-17 21:52:02     raise exc
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
2024-01-17 21:52:02     await self.app(scope, receive, _send)
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 91, in __call__
2024-01-17 21:52:02     await self.simple_response(scope, receive, send, request_headers=headers)
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 146, in simple_response
2024-01-17 21:52:02     await self.app(scope, receive, send)
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
2024-01-17 21:52:02     raise exc
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
2024-01-17 21:52:02     await self.app(scope, receive, sender)
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
2024-01-17 21:52:02     raise e
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
2024-01-17 21:52:02     await self.app(scope, receive, send)
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
2024-01-17 21:52:02     await route.handle(scope, receive, send)
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
2024-01-17 21:52:02     await self.app(scope, receive, send)
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
2024-01-17 21:52:02     response = await func(request)
2024-01-17 21:52:02                ^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 273, in app
2024-01-17 21:52:02     raw_response = await run_endpoint_function(
2024-01-17 21:52:02                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 192, in run_endpoint_function
2024-01-17 21:52:02     return await run_in_threadpool(dependant.call, **values)
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
2024-01-17 21:52:02     return await anyio.to_thread.run_sync(func, *args)
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
2024-01-17 21:52:02     return await get_asynclib().run_sync_in_worker_thread(
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
2024-01-17 21:52:02     return await future
2024-01-17 21:52:02            ^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
2024-01-17 21:52:02     result = context.run(func, *args)
2024-01-17 21:52:02              ^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/app/danswer/server/query_and_chat/chat_backend.py", line 140, in rename_chat_session
2024-01-17 21:52:02     new_name = get_renamed_conversation_name(full_history=full_history)
2024-01-17 21:52:02                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/app/danswer/secondary_llm_flows/chat_session_naming.py", line 39, in get_renamed_conversation_name
2024-01-17 21:52:02     new_name_raw = llm.invoke(filled_llm_prompt)
2024-01-17 21:52:02                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/app/danswer/llm/chat_llm.py", line 56, in invoke
2024-01-17 21:52:02     model_raw = self.llm.invoke(prompt).content
2024-01-17 21:52:02                 ^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 142, in invoke
2024-01-17 21:52:02     self.generate_prompt(
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 459, in generate_prompt
2024-01-17 21:52:02     return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 349, in generate
2024-01-17 21:52:02     raise e
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 339, in generate
2024-01-17 21:52:02     self._generate_with_cache(
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/base.py", line 492, in _generate_with_cache
2024-01-17 21:52:02     return self._generate(
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/litellm.py", line 306, in _generate
2024-01-17 21:52:02     response = self.completion_with_retry(
2024-01-17 21:52:02                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/litellm.py", line 239, in completion_with_retry
2024-01-17 21:52:02     return _completion_with_retry(**kwargs)
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 289, in wrapped_f
2024-01-17 21:52:02     return self(f, *args, **kw)
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 379, in __call__
2024-01-17 21:52:02     do = self.iter(retry_state=retry_state)
2024-01-17 21:52:02          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 325, in iter
2024-01-17 21:52:02     raise retry_exc.reraise()
2024-01-17 21:52:02           ^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 158, in reraise
2024-01-17 21:52:02     raise self.last_attempt.result()
2024-01-17 21:52:02           ^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
2024-01-17 21:52:02     return self.__get_result()
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
2024-01-17 21:52:02     raise self._exception
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 382, in __call__
2024-01-17 21:52:02     result = fn(*args, **kwargs)
2024-01-17 21:52:02              ^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/langchain/chat_models/litellm.py", line 237, in _completion_with_retry
2024-01-17 21:52:02     return self.client.completion(**kwargs)
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 1358, in wrapper
2024-01-17 21:52:02     raise e
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 1289, in wrapper
2024-01-17 21:52:02     result = original_function(*args, **kwargs)
2024-01-17 21:52:02              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 1392, in completion
2024-01-17 21:52:02     raise exception_type(
2024-01-17 21:52:02           ^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 4545, in exception_type
2024-01-17 21:52:02     raise e
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 4118, in exception_type
2024-01-17 21:52:02     raise APIError(
2024-01-17 21:52:02 litellm.exceptions.APIError: HuggingfaceException - Traceback (most recent call last):
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 423, in completion
2024-01-17 21:52:02     return self.convert_to_model_response_object(
2024-01-17 21:52:02            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02   File "/usr/local/lib/python3.11/site-packages/litellm/llms/huggingface_restapi.py", line 173, in convert_to_model_response_object
2024-01-17 21:52:02     if len(completion_response[0]["generated_text"]) > 0:
2024-01-17 21:52:02           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
2024-01-17 21:52:02 KeyError: 'generated_text'
2024-01-17 21:52:02 
2024-01-17 21:52:02 INFO:     172.18.0.6:60990 - "GET /auth/type HTTP/1.1" 200 OK
2024-01-17 21:52:02 INFO:     172.18.0.6:60882 - "GET /manage/me HTTP/1.1" 200 OK
2024-01-17 21:52:02 INFO:     172.18.0.6:60930 - "GET /query/valid-tags HTTP/1.1" 200 OK
2024-01-17 21:52:02 INFO:     172.18.0.6:60902 - "GET /persona?include_default=true HTTP/1.1" 200 OK
2024-01-17 21:52:02 INFO:     172.18.0.6:32800 - "GET /chat/get-user-chat-sessions HTTP/1.1" 200 OK
2024-01-17 21:52:02 INFO:     172.18.0.6:32774 - "GET /manage/connector HTTP/1.1" 200 OK
2024-01-17 21:52:02 INFO:     172.18.0.6:32786 - "GET /manage/document-set HTTP/1.1" 200 OK
2024-01-17 21:52:02 INFO:     172.18.0.7:51992 - "GET /chat/get-chat-session/11 HTTP/1.1" 200 OK
scriptator commented 7 months ago

Which exact settings of the GEN_AI_ variables did you try? For me the following works with a self-hosted Huggingface TGI:

GEN_AI_MODEL_VERSION="<huggingface-model-id>"
GEN_AI_MODEL_PROVIDER="huggingface"
HUGGINGFACE_API_BASE="https://xyz"
GEN_AI_API_ENDPOINT="https://xyz"

Disclaimer: I am unsure whether you need to set both HUGGINGFACE_API_BASE and GEN_AI_API_ENDPOINT but for me it works that way.

AndersGiovanni commented 7 months ago

I'm using the following settings:

GEN_AI_MODEL_PROVIDER=huggingface
GEN_AI_MODEL_VERSION=mistralai/Mistral-7B-Instruct-v0.2
GEN_AI_API_KEY=some_key
HUGGINGFACE_API_BASE=https://api-inference.huggingface.co/models/
GEN_AI_API_ENDPOINT=https://api-inference.huggingface.co/models/

I also tried with other models like meta-llama/Llama-2-70b-chat-hf and meta-llama/Llama-2-7b-chat-hf.