Closed kulbinderdio closed 9 months ago
Hey @kulbinderdio are you saying you started seeing issues with the
ghcr.io/berriai/litellm:main-v1.10.3
?
or that when you upgraded to ghcr.io/berriai/litellm:main-v1.17.0
?
The issue indicates an error in hf parsing, but we haven't made any changes to it.
@krrishdholakia I really don't understand either as I have been using the same file for ages. I did a complete clean up of docker images recently but again as this is tagged I wouldn't have expected any changes. I do notice the follow during startup, don't know if this is related
bionic-llm-api-1 | Requirement already satisfied: async_generator in /usr/local/lib/python3.9/site-packages (1.10)
bionic-tgi-1 | 2024-01-10T17:01:40.736545Z INFO text_generation_launcher: Args { model_id: "TheBloke/zephyr-7B-beta-AWQ", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Awq), dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "c6c7fc372236", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
bionic-tgi-1 | 2024-01-10T17:01:40.736608Z INFO download: text_generation_launcher: Starting download process.
bionic-llm-api-1 | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
bionic-llm-api-1 |
bionic-llm-api-1 | [notice] A new release of pip is available: 23.0.1 -> 23.3.2
bionic-llm-api-1 | [notice] To update, run: pip install --upgrade pip
bionic-llm-api-1 | /usr/local/lib/python3.9/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name" has conflict with protected namespace "model_".
bionic-llm-api-1 |
bionic-llm-api-1 | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
bionic-llm-api-1 | warnings.warn(
bionic-llm-api-1 | /usr/local/lib/python3.9/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_info" has conflict with protected namespace "model_".
bionic-llm-api-1 |
bionic-llm-api-1 | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
bionic-llm-api-1 | warnings.warn(
bionic-llm-api-1 | INFO: Started server process [10]
bionic-llm-api-1 | INFO: Waiting for application startup.
bionic-llm-api-1 | INFO: Application startup complete.
from the logs
bionic-llm-api-1 | data:{"token":{"id":2,"text":"</s>","logprob":-1.0351562,"special":true},"generated_text":"<|assistant|>\nI am not a physical entity and do not have a language model (LM) associated with me. I am a virtual assistant powered by a pre-trained transformer-based LM, which allows me to understand and generate human-like responses to your queries. The specific LM used to train me is called GPT-3 (Generative Pre-trained Transformer 3), which is one of the most advanced and powerful LMs available today.","details":{"finish_reason":"eos_token","generated_tokens":100,"seed":null}}
bionic-llm-api-1 |
bionic-llm-api-1 |
bionic-llm-api-1 |
bionic-llm-api-1 | During handling of the above exception, another exception occurred:
bionic-llm-api-1 |
bionic-llm-api-1 | Traceback (most recent call last):
bionic-llm-api-1 | File "/usr/local/lib/python3.9/site-packages/litellm/proxy/proxy_server.py", line 1464, in chat_completion
bionic-llm-api-1 | response = await litellm.acompletion(**data)
bionic-llm-api-1 | File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 2366, in wrapper_async
bionic-llm-api-1 | raise e
bionic-llm-api-1 | File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 2258, in wrapper_async
bionic-llm-api-1 | result = await original_function(*args, **kwargs)
bionic-llm-api-1 | File "/usr/local/lib/python3.9/site-packages/litellm/main.py", line 227, in acompletion
bionic-llm-api-1 | raise exception_type(
bionic-llm-api-1 | File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 6628, in exception_type
bionic-llm-api-1 | raise e
bionic-llm-api-1 | File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 6111, in exception_type
bionic-llm-api-1 | raise APIError(
bionic-llm-api-1 | litellm.exceptions.APIError: HuggingfaceException - Expecting value: line 1 column 1 (char 0)
don't know if this adds anything extra
ionic-tgi-1 | 2024-01-11T11:27:50.154869Z INFO generate_stream{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: None, return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None } total_time="1.739647484s" validation_time="237.677µs" queue_time="24.533µs" inference_time="1.73938538s" time_per_token="17.393853ms" seed="None"}: text_generation_router::server: router/src/server.rs:457: Success
bionic-llm-api-1 | receiving data: {'model': 'zephyr-7B-beta-AWQ', 'messages': [{'role': 'user', 'content': 'what llm are you'}]}
bionic-llm-api-1 | litellm.cache: None
bionic-llm-api-1 | kwargs[caching]: False; litellm.cache: None
bionic-llm-api-1 | litellm.caching: False; litellm.caching_with_models: False; litellm.cache: None
bionic-llm-api-1 | kwargs[caching]: False; litellm.cache: None
bionic-llm-api-1 |
bionic-llm-api-1 | LiteLLM completion() model= TheBloke/zephyr-7B-beta-AWQ; provider = huggingface
bionic-llm-api-1 |
bionic-llm-api-1 | LiteLLM: Params passed to completion() {'functions': [], 'function_call': '', 'temperature': None, 'top_p': None, 'stream': None, 'max_tokens': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': '', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'custom_llm_provider': 'huggingface', 'model': 'TheBloke/zephyr-7B-beta-AWQ', 'n': None, 'stop': None}
bionic-llm-api-1 |
bionic-llm-api-1 | LiteLLM: Non-Default params passed to completion() {}
bionic-llm-api-1 | self.optional_params: {}
bionic-llm-api-1 | TheBloke/zephyr-7B-beta-AWQ, text-generation-inference
bionic-llm-api-1 | PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': {'inputs': '\n\n<|user|>\nwhat llm are you</s>\n\n\n', 'parameters': {'details': True, 'return_full_text': False}, 'stream': False}, 'task': 'text-generation-inference', 'headers': {'content-type': 'application/json'}, 'api_base': 'http://tgi/generate_stream', 'acompletion': True}
bionic-llm-api-1 |
bionic-llm-api-1 |
bionic-llm-api-1 | POST Request Sent from LiteLLM:
bionic-llm-api-1 | curl -X POST \
bionic-llm-api-1 | http://tgi/generate_stream \
bionic-llm-api-1 | -H 'content-type: application/json' \
bionic-llm-api-1 | -d '{'inputs': '\n\n<|user|>\nwhat llm are you</s>\n\n\n', 'parameters': {'details': True, 'return_full_text': False}, 'stream': False}'
bionic-llm-api-1 |
bionic-llm-api-1 |
bionic-llm-api-1 | Logging Details: logger_fn - None | callable(logger_fn) - False
bionic-llm-api-1 | Logging Details LiteLLM-Failure Call
bionic-llm-api-1 | An error occurred: HuggingfaceException - Expecting value: line 1 column 1 (char 0)
@kulbinderdio if you're using a tagged docker image, then it's not our code that changed, as it builds from source.
Seeing your call again - it looks like you're pointing the proxy to a /generate/stream
endpoint but making a non-streaming call.
Can you point to http://tgi/
and see if that fixes things
or try making a streaming call and check if that works
@krrishdholakia thanks for this. I had completely missed this. Thanks a lot for your help All working
Hi
I have been using the following docker file for start up with issue but just started getting problems
I do the following curl call
and get the following. While the question gets answered at the end of the response it also throws an error - "detail":"HuggingfaceException - Expecting value: line 1 column 1 (char 0)
I have tried with different models and still get the same. I appreciate this is the latest docker image but this has been working for us hence why we kept it stable.