BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.45k stars 1.58k forks source link

[Bug]: Proxy doesn't include required dependency for Vertex AI: google-cloud-aiplatform #2469

Closed tylerbrandt closed 7 months ago

tylerbrandt commented 7 months ago

What happened?

I configured the proxy server with the following

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: vertex_ai/gemini-pro
      vertex_project: "os.environ/GCP_PROJECT"
      vertex_location: "os.environ/GCP_LOCATION"

litellm_settings:
  drop_params: True
  max_budget: 100 
  budget_duration: 30d
  num_retries: 0
  request_timeout: 600
general_settings: 
  master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
  proxy_budget_rescheduler_min_time: 60
  proxy_budget_rescheduler_max_time: 64
  # database_url: "postgresql://<user>:<password>@<host>:<port>/<dbname>" # [OPTIONAL] use for token-based auth to proxy

environment_variables:
  # settings for using redis caching
  # REDIS_HOST: redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com
  # REDIS_PORT: "16337"
  # REDIS_PASSWORD: 

built a virtualenv and ran it:

virtualenv .venv
. .venv/bin/activate
pip install -r requirements.txt
litellm -c proxy_server_config.yaml

and attempted to call it with the snippet:

curl http://0.0.0.0:4000/chat/completions --header 'Content-Type: application/json' --data '{"model": "gemini-pro", "messages": [{"role": "user", "content": "what llm are you"}]}' --header "Authorization: Bearer sk-1234"

I got an error response back:

{"error":{"message":"VertexAIException - VertexAIException - vertexai import failed please run `pip install google-cloud-aiplatform`","type":null,"param":null,"code":429}}

Full debug log is included below. I was able to resolve the issue by adding a dependency on google-cloud-aiplatform to requirements.txt (as noted on the Vertex AI page) and rebuilding.

google-cloud-aiplatform==1.43.0 # for vertex ai calls

Requiring the additional package is fine for usage as a library, but when using the proxy server using e.g. docker-compose or kubernetes I haven't come up with a simple way to add the dependency without rebuilding the image. Should the google-cloud-aiplatform dependency be installed by default? Is there a reason it isn't (it seems like many third-party dependencies for supporting generation using various models are installed, including google-generativeai)?

Relevant log output

09:21:27 - LiteLLM Proxy:DEBUG: Request Headers: Headers({'host': '0.0.0.0:4000', 'user-agent': 'curl/8.4.0', 'accept': '*/*', 'content-type': 'application/json', 'authorization': 'Bearer sk-1234', 'content-length': '86'})
09:21:27 - LiteLLM Proxy:DEBUG: receiving data: {'model': 'gemini-pro', 'messages': [{'role': 'user', 'content': 'what llm are you'}], 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'curl/8.4.0', 'accept': '*/*', 'content-type': 'application/json', 'authorization': 'Bearer sk-1234', 'content-length': '86'}, 'body': {'model': 'gemini-pro', 'messages': [{'role': 'user', 'content': 'what llm are you'}]}}}
09:21:27 - LiteLLM Proxy:DEBUG: Inside Proxy Logging Pre-call hook!
09:21:27 - LiteLLM Proxy:DEBUG: Inside Max Parallel Request Pre-Call Hook
09:21:27 - LiteLLM Proxy:DEBUG: current: None
09:21:27 - LiteLLM Proxy:DEBUG: final data being sent to completion call: {'model': 'gemini-pro', 'messages': [{'role': 'user', 'content': 'what llm are you'}], 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'curl/8.4.0', 'accept': '*/*', 'content-type': 'application/json', 'authorization': 'Bearer sk-1234', 'content-length': '86'}, 'body': {'model': 'gemini-pro', 'messages': [{'role': 'user', 'content': 'what llm are you'}]}}, 'user': 'default_user_id', 'metadata': {'user_api_key': 'sk-1234', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'curl/8.4.0', 'accept': '*/*', 'content-type': 'application/json', 'content-length': '86'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions'}, 'request_timeout': 600}
09:21:27 - LiteLLM Router:DEBUG: Inside async function with retries: args - (); kwargs - {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'curl/8.4.0', 'accept': '*/*', 'content-type': 'application/json', 'authorization': 'Bearer sk-1234', 'content-length': '86'}, 'body': {'model': 'gemini-pro', 'messages': [{'role': 'user', 'content': 'what llm are you'}]}}, 'user': 'default_user_id', 'metadata': {'user_api_key': 'sk-1234', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'curl/8.4.0', 'accept': '*/*', 'content-type': 'application/json', 'content-length': '86'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'model_group': 'gemini-pro'}, 'request_timeout': 600, 'model': 'gemini-pro', 'messages': [{'role': 'user', 'content': 'what llm are you'}], 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x1207e7af0>>, 'num_retries': 0}
09:21:27 - LiteLLM Router:DEBUG: async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x1207e7af0>>
09:21:27 - LiteLLM Router:DEBUG: Inside _acompletion()- model: gemini-pro; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'curl/8.4.0', 'accept': '*/*', 'content-type': 'application/json', 'authorization': 'Bearer sk-1234', 'content-length': '86'}, 'body': {'model': 'gemini-pro', 'messages': [{'role': 'user', 'content': 'what llm are you'}]}}, 'user': 'default_user_id', 'metadata': {'user_api_key': 'sk-1234', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'curl/8.4.0', 'accept': '*/*', 'content-type': 'application/json', 'content-length': '86'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'model_group': 'gemini-pro'}, 'request_timeout': 600}
09:21:27 - LiteLLM Router:DEBUG: initial list of deployments: [{'model_name': 'gemini-pro', 'litellm_params': {'model': 'vertex_ai/gemini-pro', 'vertex_project': None, 'vertex_location': None}, 'model_info': {'id': 'db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae'}}]
09:21:27 - LiteLLM Router:DEBUG: retrieve cooldown models: []
09:21:27 - LiteLLM Router:DEBUG: cooldown deployments: []
09:21:27 - LiteLLM Router:DEBUG: healthy deployments: length 1 [{'model_name': 'gemini-pro', 'litellm_params': {'model': 'vertex_ai/gemini-pro', 'vertex_project': None, 'vertex_location': None}, 'model_info': {'id': 'db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae'}}]
09:21:27 - LiteLLM Router:DEBUG: Attempting to add db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae to cooldown list. updated_fails: 1; self.allowed_fails: 0
09:21:27 - LiteLLM Router:DEBUG: adding db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae to cooldown models
09:21:27 - LiteLLM Router:DEBUG: Attempting to add db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae to cooldown list. updated_fails: 1; self.allowed_fails: 0
09:21:27 - LiteLLM Router:DEBUG: adding db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae to cooldown models
09:21:27 - LiteLLM Router:DEBUG: Attempting to add db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae to cooldown list. updated_fails: 1; self.allowed_fails: 0
09:21:27 - LiteLLM Router:DEBUG: adding db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae to cooldown models
09:21:27 - LiteLLM Router:DEBUG: Attempting to add db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae to cooldown list. updated_fails: 1; self.allowed_fails: 0
09:21:27 - LiteLLM Router:DEBUG: adding db48d6ba-ebfc-4f71-9a13-61fe2dd3f7ae to cooldown models
09:21:27 - LiteLLM Proxy:DEBUG: Inside Max Parallel Request Failure Hook
09:21:27 - LiteLLM Proxy:DEBUG: updated_value in failure call: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
09:21:27 - LiteLLM Router:INFO: litellm.acompletion(model=vertex_ai/gemini-pro) Exception VertexAIException - VertexAIException - vertexai import failed please run `pip install google-cloud-aiplatform`
09:21:27 - LiteLLM Router:DEBUG: TracebackTraceback (most recent call last):
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/llms/vertex_ai.py", line 266, in completion
    import vertexai
ModuleNotFoundError: No module named 'vertexai'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/main.py", line 1514, in completion
    model_response = vertex_ai.completion(
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/llms/vertex_ai.py", line 268, in completion
    raise VertexAIError(
litellm.llms.vertex_ai.VertexAIError: vertexai import failed please run `pip install google-cloud-aiplatform`

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/main.py", line 273, in acompletion
    init_response = await loop.run_in_executor(None, func_with_context)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 2727, in wrapper
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 2628, in wrapper
    result = original_function(*args, **kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/main.py", line 1941, in completion
    raise exception_type(
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 8100, in exception_type
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 7319, in exception_type
    raise RateLimitError(
litellm.exceptions.RateLimitError: VertexAIException - vertexai import failed please run `pip install google-cloud-aiplatform`

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 1140, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 1327, in async_function_with_retries
    raise original_exception
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 1234, in async_function_with_retries
    response = await original_function(*args, **kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 482, in _acompletion
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 458, in _acompletion
    response = await litellm.acompletion(
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 3181, in wrapper_async
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 3017, in wrapper_async
    result = await original_function(*args, **kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/main.py", line 292, in acompletion
    raise exception_type(
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 8100, in exception_type
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 7319, in exception_type
    raise RateLimitError(
litellm.exceptions.RateLimitError: VertexAIException - VertexAIException - vertexai import failed please run `pip install google-cloud-aiplatform`

09:21:27 - LiteLLM Router:DEBUG: Trying to fallback b/w models
Traceback (most recent call last):
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/llms/vertex_ai.py", line 266, in completion
    import vertexai
ModuleNotFoundError: No module named 'vertexai'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/main.py", line 1514, in completion
    model_response = vertex_ai.completion(
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/llms/vertex_ai.py", line 268, in completion
    raise VertexAIError(
litellm.llms.vertex_ai.VertexAIError: vertexai import failed please run `pip install google-cloud-aiplatform`

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/main.py", line 273, in acompletion
    init_response = await loop.run_in_executor(None, func_with_context)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 2727, in wrapper
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 2628, in wrapper
    result = original_function(*args, **kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/main.py", line 1941, in completion
    raise exception_type(
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 8100, in exception_type
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 7319, in exception_type
    raise RateLimitError(
litellm.exceptions.RateLimitError: VertexAIException - vertexai import failed please run `pip install google-cloud-aiplatform`

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/proxy/proxy_server.py", line 2822, in chat_completion
    responses = await asyncio.gather(
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 399, in acompletion
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 395, in acompletion
    response = await self.async_function_with_fallbacks(**kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 1217, in async_function_with_fallbacks
    raise original_exception
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 1140, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 1327, in async_function_with_retries
    raise original_exception
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 1234, in async_function_with_retries
    response = await original_function(*args, **kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 482, in _acompletion
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/router.py", line 458, in _acompletion
    response = await litellm.acompletion(
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 3181, in wrapper_async
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 3017, in wrapper_async
    result = await original_function(*args, **kwargs)
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/main.py", line 292, in acompletion
    raise exception_type(
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 8100, in exception_type
    raise e
  File "/Users/tyler.brandt/Documents/Projects/thirdparty/litellm/litellm/utils.py", line 7319, in exception_type
    raise RateLimitError(
litellm.exceptions.RateLimitError: VertexAIException - VertexAIException - vertexai import failed please run `pip install google-cloud-aiplatform`
09:21:27 - LiteLLM Proxy:DEBUG: An error occurred: VertexAIException - VertexAIException - vertexai import failed please run `pip install google-cloud-aiplatform`

 Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`
09:21:27 - LiteLLM Proxy:DEBUG: Results from router
09:21:27 - LiteLLM Proxy:DEBUG: 
Router stats
09:21:27 - LiteLLM Proxy:DEBUG: 
Total Calls made
09:21:27 - LiteLLM Proxy:DEBUG: vertex_ai/gemini-pro: 1
09:21:27 - LiteLLM Proxy:DEBUG: 
Success Calls made
09:21:27 - LiteLLM Proxy:DEBUG: 
Fail Calls made
09:21:27 - LiteLLM Proxy:DEBUG: vertex_ai/gemini-pro: 1
INFO:     127.0.0.1:57332 - "POST /chat/completions HTTP/1.1" 429 Too Many Requests

Twitter / LinkedIn details

No response

krrishdholakia commented 7 months ago

Hi @tylerbrandt - will update this ticket when the PR is merged - https://github.com/BerriAI/litellm/pull/2433

DM'ed on LinkedIn to create a direct support channel. Let me know if another channel (e.g. Discord/Slack) works better.