BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
11.7k stars 1.35k forks source link

[Bug]: Graceful rejection of token input for AWS Embeddings API #1681

Closed rmann-nflx closed 7 months ago

rmann-nflx commented 7 months ago

What happened?

Hi Team,

Love the project! First time contributing.

I found a minor inconsistency in how input is handled across the OpenAI and AWS Embeddings API which results in a incorrect an APIConnectionError being thrown instead of a BadRequestError.

In this report I provide some context on the API and the usage of it, provided a stack trace, provided code to reproduce the bug, and included my idea for a fix. With the maintainers approval I can implement my suggested fix.

Background

The OpenAI Embeddings API accepts as input: str | List[str] | List[int] | List[List[int]].

However, the Bedrock Embeddings API for amazon.titan-embed-text-v1 accepts only str | List[str].

Problem

Passing input=List[int] into embedding(model='$OPENAI_MODEL') is valid, but when passed into embedding(model='amazon.titan-embed-text-v1') results in:

> raise APIConnectionError(
      message=f"{str(original_exception)}",
      llm_provider=custom_llm_provider,
      model=model,
      request=httpx.Request(
          method="POST", url="https://api.openai.com/v1/"
      ),  # stub the request
  )
E litellm.exceptions.APIConnectionError: 'int' object has no attribute 'replace'

Passing input=List[List[int]] into embedding(model='$OPENAI_MODEL') is valid, but when passed into embedding(model='amazon.titan-embed-text-v1') results in:

> raise APIConnectionError(
      message=f"{str(original_exception)}",
      llm_provider=custom_llm_provider,
      model=model,
      request=httpx.Request(
          method="POST", url="https://api.openai.com/v1/"
      ),  # stub the request
  )
E litellm.exceptions.APIConnectionError: 'list' object has no attribute 'replace'

(Stack trace included in relevant log output).

Following the Stacktrace shows that base exception comes from input preprocessing at llms/bedrock.py#L716. This line assumes that the input for Amazon LLM's is type str.

model = 'amazon.titan-embed-text-v1', input = [1]
client = <botocore.client.BedrockRuntime object at 0x1201f67a0>
optional_params = {}, encoding = None
logging_obj = <litellm.utils.Logging object at 0x117c0d450>

    def _embedding_func_single(
        model: str,
        input: str,
        client: Any,
        optional_params=None,
        encoding=None,
        logging_obj=None,
    ):
        # logic for parsing in - calling - parsing out model embedding calls
        ## FORMAT EMBEDDING INPUT ##
        provider = model.split(".")[0]
        inference_params = copy.deepcopy(optional_params)
        inference_params.pop(
            "user", None
        )  # make sure user is not passed in for bedrock call
        modelId = (
            optional_params.pop("model_id", None) or model
        )  # default to model if not passed
        if provider == "amazon":
>           input = input.replace(os.linesep, " ")
E           AttributeError: 'list' object has no attribute 'replace'

I think this should be a BadRequestError instead of a APIConnectionError.

Suggested Resolution: Update Bedrock Embedding API to fail-fast on invalid input type

My proposed change would be add a line around litellm/llms/bedrock.py#L810 to verify that it is iterating over List[str] and if the iterated member is not instanceof str then raise an BadRequestError with message 'Bedrock Embedding API input must be type str | List[str]'.

This change will work for the 3 models that LiteLLM has documented as supporting: Titan Embeddings - G1, Cohere Embeddings - English, and Cohere Embeddings - Multilingual. Based on the provided examples in the AWS Console all 3 of these models only support text input.

First thing to consider: Bedrock also a multimodal model named Titan Multimodal Embeddings Generation 1. Which has as input:

{
  "modelId": "amazon.titan-embed-image-v1",
  "contentType": "application/json",
  "accept": "application/json",
  "body": {
    "inputText": "this is where you place your input text",
    "inputImage": "<base64_image_string>"
  }
}

This API isn't supported by LiteLLM, and OpenAI doesn't offer multimodal embeddings, but if OpenAI does offer multimodal embeddings in the future then the above added typecheck would need to be removed.

I think this is acceptable risk as the type check can be updated to confirm the iterated item confirms to the new required type signature.

Second thing to consider: Could there be a wider type checking system in place to handle future API divergences? I think this is a good question but it's a decision which can come later as the above code does not create any one way doors.

If you agree with this approach please confirm and I can implement the fix and send a PR

Reproducible unit test

Running litellm-1.17.16

import pytest
import litellm

def test_demo_tokens_as_input_to_embeddings_fails_for_titan():
    # Works as expected
    response_for_input_as_list_str = litellm.embedding(
        model='amazon.titan-embed-text-v1',
        input=['Hello world'],
        aws_region_name=AWS_REGION_NAME,
        aws_role_name=AWS_ROLE_NAME,
        aws_session_name=AWS_SESSION_NAME
    )

    assert len(response_for_input_as_list_str.data) == 1

    # Works as expected
    response_for_input_as_tokens_openai = litellm.embedding(
        model='text-embedding-ada-002',
        input=[1, 2, 3],
        api_key=OPENAI_TOKEN
    )

    assert len(response_for_input_as_tokens_openai.data) == 1

    # Works as expected
    response_for_input_as_list_of_tokens_openai = litellm.embedding(
        model='text-embedding-ada-002',
        input=[[1]],
        api_key=OPENAI_TOKEN
    )

    assert len(response_for_input_as_list_of_tokens_openai.data) == 1

    with pytest.raises(litellm.APIConnectionError, match="'list' object has no attribute 'replace'"):
        litellm.embedding(
            model='amazon.titan-embed-text-v1',
            input=[[1]],
            aws_region_name=AWS_REGION_NAME,
            aws_role_name=AWS_ROLE_NAME,
            aws_session_name=AWS_SESSION_NAME
        )

    with pytest.raises(litellm.APIConnectionError, match="'int' object has no attribute 'replace'"):
        litellm.embedding(
            model='amazon.titan-embed-text-v1',
            input=[1],
            aws_region_name=AWS_REGION_NAME,
            aws_role_name=AWS_ROLE_NAME,
            aws_session_name=AWS_SESSION_NAME
        )

Relevant log output

No response

Twitter / LinkedIn details

No response

rmann-nflx commented 7 months ago

Relevant log output

Relevant log output ``` test_embeddings.py:31 (test_tokens_as_input_to_embeddings) model = 'amazon.titan-embed-text-v1', input = [[1]], timeout = 600 api_base = None, api_version = None, api_key = None, api_type = None caching = False, user = None, custom_llm_provider = 'bedrock' litellm_call_id = '08cdd320-f973-41d4-9f56-c7b0da5d486d' litellm_logging_obj = logger_fn = None kwargs = {'aws_region_name': '...', 'aws_role_name': '...', 'aws_session_name': '...'} azure = None, client = None, rpm = None, tpm = None, model_info = None metadata = None, encoding_format = None @client def embedding( model, input=[], # Optional params timeout=600, # default to 10 minutes # set api_base, api_version, api_key api_base: Optional[str] = None, api_version: Optional[str] = None, api_key: Optional[str] = None, api_type: Optional[str] = None, caching: bool = False, user: Optional[str] = None, custom_llm_provider=None, litellm_call_id=None, litellm_logging_obj=None, logger_fn=None, **kwargs, ): """ Embedding function that calls an API to generate embeddings for the given input. Parameters: - model: The embedding model to use. - input: The input for which embeddings are to be generated. - timeout: The timeout value for the API call, default 10 mins - litellm_call_id: The call ID for litellm logging. - litellm_logging_obj: The litellm logging object. - logger_fn: The logger function. - api_base: Optional. The base URL for the API. - api_version: Optional. The version of the API. - api_key: Optional. The API key to use. - api_type: Optional. The type of the API. - caching: A boolean indicating whether to enable caching. - custom_llm_provider: The custom llm provider. Returns: - response: The response received from the API call. Raises: - exception_type: If an exception occurs during the API call. """ azure = kwargs.get("azure", None) client = kwargs.pop("client", None) rpm = kwargs.pop("rpm", None) tpm = kwargs.pop("tpm", None) model_info = kwargs.get("model_info", None) metadata = kwargs.get("metadata", None) encoding_format = kwargs.get("encoding_format", None) proxy_server_request = kwargs.get("proxy_server_request", None) aembedding = kwargs.get("aembedding", None) openai_params = [ "user", "request_timeout", "api_base", "api_version", "api_key", "deployment_id", "organization", "base_url", "default_headers", "timeout", "max_retries", "encoding_format", ] litellm_params = [ "metadata", "aembedding", "caching", "mock_response", "api_key", "api_version", "api_base", "force_timeout", "logger_fn", "verbose", "custom_llm_provider", "litellm_logging_obj", "litellm_call_id", "use_client", "id", "fallbacks", "azure", "headers", "model_list", "num_retries", "context_window_fallback_dict", "roles", "final_prompt_value", "bos_token", "eos_token", "request_timeout", "complete_response", "self", "client", "rpm", "tpm", "input_cost_per_token", "output_cost_per_token", "hf_model_name", "proxy_server_request", "model_info", "preset_cache_key", "caching_groups", "ttl", "cache", ] default_params = openai_params + litellm_params non_default_params = { k: v for k, v in kwargs.items() if k not in default_params } # model-specific params - pass them straight to the model/provider model, custom_llm_provider, dynamic_api_key, api_base = get_llm_provider( model=model, custom_llm_provider=custom_llm_provider, api_base=api_base, api_key=api_key, ) optional_params = get_optional_params_embeddings( user=user, encoding_format=encoding_format, custom_llm_provider=custom_llm_provider, **non_default_params, ) try: response = None logging = litellm_logging_obj logging.update_environment_variables( model=model, user=user, optional_params=optional_params, litellm_params={ "timeout": timeout, "azure": azure, "litellm_call_id": litellm_call_id, "logger_fn": logger_fn, "proxy_server_request": proxy_server_request, "model_info": model_info, "metadata": metadata, "aembedding": aembedding, "preset_cache_key": None, "stream_response": {}, }, ) if azure == True or custom_llm_provider == "azure": # azure configs api_type = get_secret("AZURE_API_TYPE") or "azure" api_base = api_base or litellm.api_base or get_secret("AZURE_API_BASE") api_version = ( api_version or litellm.api_version or get_secret("AZURE_API_VERSION") ) azure_ad_token = kwargs.pop("azure_ad_token", None) or get_secret( "AZURE_AD_TOKEN" ) api_key = ( api_key or litellm.api_key or litellm.azure_key or get_secret("AZURE_API_KEY") ) ## EMBEDDING CALL response = azure_chat_completions.embedding( model=model, input=input, api_base=api_base, api_key=api_key, api_version=api_version, azure_ad_token=azure_ad_token, logging_obj=logging, timeout=timeout, model_response=EmbeddingResponse(), optional_params=optional_params, client=client, aembedding=aembedding, ) elif ( model in litellm.open_ai_embedding_models or custom_llm_provider == "openai" ): api_base = ( api_base or litellm.api_base or get_secret("OPENAI_API_BASE") or "https://api.openai.com/v1" ) openai.organization = ( litellm.organization or get_secret("OPENAI_ORGANIZATION") or None # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105 ) # set API KEY api_key = ( api_key or litellm.api_key or litellm.openai_key or get_secret("OPENAI_API_KEY") ) api_type = "openai" api_version = None ## EMBEDDING CALL response = openai_chat_completions.embedding( model=model, input=input, api_base=api_base, api_key=api_key, logging_obj=logging, timeout=timeout, model_response=EmbeddingResponse(), optional_params=optional_params, client=client, aembedding=aembedding, ) elif model in litellm.cohere_embedding_models: cohere_key = ( api_key or litellm.cohere_key or get_secret("COHERE_API_KEY") or get_secret("CO_API_KEY") or litellm.api_key ) response = cohere.embedding( model=model, input=input, optional_params=optional_params, encoding=encoding, api_key=cohere_key, logging_obj=logging, model_response=EmbeddingResponse(), ) elif custom_llm_provider == "huggingface": api_key = ( api_key or litellm.huggingface_key or get_secret("HUGGINGFACE_API_KEY") or litellm.api_key ) response = huggingface.embedding( model=model, input=input, encoding=encoding, api_key=api_key, api_base=api_base, logging_obj=logging, model_response=EmbeddingResponse(), ) elif custom_llm_provider == "bedrock": > response = bedrock.embedding( model=model, input=input, encoding=encoding, logging_obj=logging, optional_params=optional_params, model_response=EmbeddingResponse(), ../../../../Library/Caches/pypoetry/virtualenvs/PROJECT-py3.10/lib/python3.10/site-packages/litellm/main.py:2424: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ model = 'amazon.titan-embed-text-v1', input = [[1]], api_key = None logging_obj = model_response = EmbeddingResponse(model=None, data=None, object='list', usage=Usage()) optional_params = {}, encoding = def embedding( model: str, input: Union[list, str], api_key: Optional[str] = None, logging_obj=None, model_response=None, optional_params=None, encoding=None, ): ### BOTO3 INIT ### # pop aws_secret_access_key, aws_access_key_id, aws_region_name from kwargs, since completion calls fail with them aws_secret_access_key = optional_params.pop("aws_secret_access_key", None) aws_access_key_id = optional_params.pop("aws_access_key_id", None) aws_region_name = optional_params.pop("aws_region_name", None) aws_role_name = optional_params.pop("aws_role_name", None) aws_session_name = optional_params.pop("aws_session_name", None) aws_bedrock_runtime_endpoint = optional_params.pop( "aws_bedrock_runtime_endpoint", None ) # use passed in BedrockRuntime.Client if provided, otherwise create a new one client = init_bedrock_client( aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, aws_region_name=aws_region_name, aws_bedrock_runtime_endpoint=aws_bedrock_runtime_endpoint, aws_role_name=aws_role_name, aws_session_name=aws_session_name, ) if type(input) == str: embeddings = [ _embedding_func_single( model, input, optional_params=optional_params, client=client, logging_obj=logging_obj, ) ] else: ## Embedding Call > embeddings = [ _embedding_func_single( model, i, optional_params=optional_params, client=client, logging_obj=logging_obj, ) for i in input ] # [TODO]: make these parallel calls ../../../../Library/Caches/pypoetry/virtualenvs/PROJECT-py3.10/lib/python3.10/site-packages/litellm/llms/bedrock.py:801: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ .0 = embeddings = [ > _embedding_func_single( model, i, optional_params=optional_params, client=client, logging_obj=logging_obj, ) for i in input ] # [TODO]: make these parallel calls ../../../../Library/Caches/pypoetry/virtualenvs/PROJECT-py3.10/lib/python3.10/site-packages/litellm/llms/bedrock.py:802: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ model = 'amazon.titan-embed-text-v1', input = [1] client = optional_params = {}, encoding = None logging_obj = def _embedding_func_single( model: str, input: str, client: Any, optional_params=None, encoding=None, logging_obj=None, ): # logic for parsing in - calling - parsing out model embedding calls ## FORMAT EMBEDDING INPUT ## provider = model.split(".")[0] inference_params = copy.deepcopy(optional_params) inference_params.pop( "user", None ) # make sure user is not passed in for bedrock call modelId = ( optional_params.pop("model_id", None) or model ) # default to model if not passed if provider == "amazon": > input = input.replace(os.linesep, " ") E AttributeError: 'list' object has no attribute 'replace' ../../../../Library/Caches/pypoetry/virtualenvs/PROJECT-py3.10/lib/python3.10/site-packages/litellm/llms/bedrock.py:707: AttributeError During handling of the above exception, another exception occurred: def test_tokens_as_input_to_embeddings(): os.environ["AWS_CONTAINER_CREDENTIALS_FULL_URI"] = \ "..." # Works as expected response_for_input_as_list_str = embedding( model='amazon.titan-embed-text-v1', input=['Hello world'], aws_region_name='...', aws_role_name='...', aws_session_name='...' ) assert len(response_for_input_as_list_str.data) == 1 > response_for_input_as_list_tokens = embedding( model='amazon.titan-embed-text-v1', input=[[1]], aws_region_name='...', aws_role_name='...', aws_session_name='...' ) test_embeddings.py:47: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../../../Library/Caches/pypoetry/virtualenvs/PROJECT-py3.10/lib/python3.10/site-packages/litellm/utils.py:2180: in wrapper raise e ../../../../Library/Caches/pypoetry/virtualenvs/PROJECT-py3.10/lib/python3.10/site-packages/litellm/utils.py:2087: in wrapper result = original_function(*args, **kwargs) ../../../../Library/Caches/pypoetry/virtualenvs/PROJECT-py3.10/lib/python3.10/site-packages/litellm/main.py:2529: in embedding raise exception_type( ../../../../Library/Caches/pypoetry/virtualenvs/PROJECT-py3.10/lib/python3.10/site-packages/litellm/utils.py:6749: in exception_type raise e _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ model = 'amazon.titan-embed-text-v1' original_exception = AttributeError("'list' object has no attribute 'replace'") custom_llm_provider = 'bedrock', completion_kwargs = {} def exception_type( model, original_exception, custom_llm_provider, completion_kwargs={}, ): global user_logger_fn, liteDebuggerClient exception_mapping_worked = False if litellm.suppress_debug_info is False: print() # noqa print( # noqa "\033[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new\033[0m" # noqa ) # noqa print( # noqa "LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'." # noqa ) # noqa print() # noqa try: if model: error_str = str(original_exception) if isinstance(original_exception, BaseException): exception_type = type(original_exception).__name__ else: exception_type = "" if "Request Timeout Error" in error_str or "Request timed out" in error_str: exception_mapping_worked = True raise Timeout( message=f"APITimeoutError - Request timed out", model=model, llm_provider=custom_llm_provider, ) if ( custom_llm_provider == "openai" or custom_llm_provider == "text-completion-openai" or custom_llm_provider == "custom_openai" or custom_llm_provider in litellm.openai_compatible_providers ): if ( "This model's maximum context length is" in error_str or "Request too large" in error_str ): exception_mapping_worked = True raise ContextWindowExceededError( message=f"OpenAIException - {original_exception.message}", llm_provider="openai", model=model, response=original_exception.response, ) elif ( "invalid_request_error" in error_str and "model_not_found" in error_str ): exception_mapping_worked = True raise NotFoundError( message=f"OpenAIException - {original_exception.message}", llm_provider="openai", model=model, response=original_exception.response, ) elif ( "invalid_request_error" in error_str and "content_policy_violation" in error_str ): exception_mapping_worked = True raise ContentPolicyViolationError( message=f"OpenAIException - {original_exception.message}", llm_provider="openai", model=model, response=original_exception.response, ) elif ( "invalid_request_error" in error_str and "Incorrect API key provided" not in error_str ): exception_mapping_worked = True raise BadRequestError( message=f"OpenAIException - {original_exception.message}", llm_provider="openai", model=model, response=original_exception.response, ) elif hasattr(original_exception, "status_code"): exception_mapping_worked = True if original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( message=f"OpenAIException - {original_exception.message}", llm_provider="openai", model=model, response=original_exception.response, ) elif original_exception.status_code == 404: exception_mapping_worked = True raise NotFoundError( message=f"OpenAIException - {original_exception.message}", model=model, llm_provider="openai", response=original_exception.response, ) elif original_exception.status_code == 408: exception_mapping_worked = True raise Timeout( message=f"OpenAIException - {original_exception.message}", model=model, llm_provider="openai", ) elif original_exception.status_code == 422: exception_mapping_worked = True raise BadRequestError( message=f"OpenAIException - {original_exception.message}", model=model, llm_provider="openai", response=original_exception.response, ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( message=f"OpenAIException - {original_exception.message}", model=model, llm_provider="openai", response=original_exception.response, ) elif original_exception.status_code == 503: exception_mapping_worked = True raise ServiceUnavailableError( message=f"OpenAIException - {original_exception.message}", model=model, llm_provider="openai", response=original_exception.response, ) elif original_exception.status_code == 504: # gateway timeout error exception_mapping_worked = True raise Timeout( message=f"OpenAIException - {original_exception.message}", model=model, llm_provider="openai", ) else: exception_mapping_worked = True raise APIError( status_code=original_exception.status_code, message=f"OpenAIException - {original_exception.message}", llm_provider="openai", model=model, request=original_exception.request, ) else: # if no status code then it is an APIConnectionError: https://github.com/openai/openai-python#handling-errors raise APIConnectionError( __cause__=original_exception.__cause__, llm_provider=custom_llm_provider, model=model, request=original_exception.request, ) elif custom_llm_provider == "anthropic": # one of the anthropics if hasattr(original_exception, "message"): if ( "prompt is too long" in original_exception.message or "prompt: length" in original_exception.message ): exception_mapping_worked = True raise ContextWindowExceededError( message=original_exception.message, model=model, llm_provider="anthropic", response=original_exception.response, ) if "Invalid API Key" in original_exception.message: exception_mapping_worked = True raise AuthenticationError( message=original_exception.message, model=model, llm_provider="anthropic", response=original_exception.response, ) if hasattr(original_exception, "status_code"): print_verbose(f"status_code: {original_exception.status_code}") if original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( message=f"AnthropicException - {original_exception.message}", llm_provider="anthropic", model=model, response=original_exception.response, ) elif ( original_exception.status_code == 400 or original_exception.status_code == 413 ): exception_mapping_worked = True raise BadRequestError( message=f"AnthropicException - {original_exception.message}", model=model, llm_provider="anthropic", response=original_exception.response, ) elif original_exception.status_code == 408: exception_mapping_worked = True raise Timeout( message=f"AnthropicException - {original_exception.message}", model=model, llm_provider="anthropic", request=original_exception.request, ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( message=f"AnthropicException - {original_exception.message}", llm_provider="anthropic", model=model, response=original_exception.response, ) elif original_exception.status_code == 500: exception_mapping_worked = True raise ServiceUnavailableError( message=f"AnthropicException - {original_exception.message}", llm_provider="anthropic", model=model, response=original_exception.response, ) else: exception_mapping_worked = True raise APIError( status_code=original_exception.status_code, message=f"AnthropicException - {original_exception.message}", llm_provider="anthropic", model=model, request=original_exception.request, ) elif custom_llm_provider == "replicate": if "Incorrect authentication token" in error_str: exception_mapping_worked = True raise AuthenticationError( message=f"ReplicateException - {error_str}", llm_provider="replicate", model=model, response=original_exception.response, ) elif "input is too long" in error_str: exception_mapping_worked = True raise ContextWindowExceededError( message=f"ReplicateException - {error_str}", model=model, llm_provider="replicate", response=original_exception.response, ) elif exception_type == "ModelError": exception_mapping_worked = True raise BadRequestError( message=f"ReplicateException - {error_str}", model=model, llm_provider="replicate", response=original_exception.response, ) elif "Request was throttled" in error_str: exception_mapping_worked = True raise RateLimitError( message=f"ReplicateException - {error_str}", llm_provider="replicate", model=model, response=original_exception.response, ) elif hasattr(original_exception, "status_code"): if original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( message=f"ReplicateException - {original_exception.message}", llm_provider="replicate", model=model, response=original_exception.response, ) elif ( original_exception.status_code == 400 or original_exception.status_code == 422 or original_exception.status_code == 413 ): exception_mapping_worked = True raise BadRequestError( message=f"ReplicateException - {original_exception.message}", model=model, llm_provider="replicate", response=original_exception.response, ) elif original_exception.status_code == 408: exception_mapping_worked = True raise Timeout( message=f"ReplicateException - {original_exception.message}", model=model, llm_provider="replicate", request=original_exception.request, ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( message=f"ReplicateException - {original_exception.message}", llm_provider="replicate", model=model, response=original_exception.response, ) elif original_exception.status_code == 500: exception_mapping_worked = True raise ServiceUnavailableError( message=f"ReplicateException - {original_exception.message}", llm_provider="replicate", model=model, response=original_exception.response, ) exception_mapping_worked = True raise APIError( status_code=500, message=f"ReplicateException - {str(original_exception)}", llm_provider="replicate", model=model, request=original_exception.request, ) elif custom_llm_provider == "bedrock": if ( "too many tokens" in error_str or "expected maxLength:" in error_str or "Input is too long" in error_str or "prompt: length: 1.." in error_str or "Too many input tokens" in error_str ): exception_mapping_worked = True raise ContextWindowExceededError( message=f"BedrockException: Context Window Error - {error_str}", model=model, llm_provider="bedrock", response=original_exception.response, ) if "Malformed input request" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"BedrockException - {error_str}", model=model, llm_provider="bedrock", response=original_exception.response, ) if ( "Unable to locate credentials" in error_str or "The security token included in the request is invalid" in error_str ): exception_mapping_worked = True raise AuthenticationError( message=f"BedrockException Invalid Authentication - {error_str}", model=model, llm_provider="bedrock", response=original_exception.response, ) if "AccessDeniedException" in error_str: exception_mapping_worked = True raise PermissionDeniedError( message=f"BedrockException PermissionDeniedError - {error_str}", model=model, llm_provider="bedrock", response=original_exception.response, ) if ( "throttlingException" in error_str or "ThrottlingException" in error_str ): exception_mapping_worked = True raise RateLimitError( message=f"BedrockException: Rate Limit Error - {error_str}", model=model, llm_provider="bedrock", response=original_exception.response, ) if hasattr(original_exception, "status_code"): if original_exception.status_code == 500: exception_mapping_worked = True raise ServiceUnavailableError( message=f"BedrockException - {original_exception.message}", llm_provider="bedrock", model=model, response=original_exception.response, ) elif original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( message=f"BedrockException - {original_exception.message}", llm_provider="bedrock", model=model, response=original_exception.response, ) elif original_exception.status_code == 400: exception_mapping_worked = True raise BadRequestError( message=f"BedrockException - {original_exception.message}", llm_provider="bedrock", model=model, response=original_exception.response, ) elif original_exception.status_code == 404: exception_mapping_worked = True raise NotFoundError( message=f"BedrockException - {original_exception.message}", llm_provider="bedrock", model=model, response=original_exception.response, ) elif custom_llm_provider == "sagemaker": if "Unable to locate credentials" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"SagemakerException - {error_str}", model=model, llm_provider="sagemaker", response=original_exception.response, ) elif ( "Input validation error: `best_of` must be > 0 and <= 2" in error_str ): exception_mapping_worked = True raise BadRequestError( message=f"SagemakerException - the value of 'n' must be > 0 and <= 2 for sagemaker endpoints", model=model, llm_provider="sagemaker", response=original_exception.response, ) elif ( "`inputs` tokens + `max_new_tokens` must be <=" in error_str or "instance type with more CPU capacity or memory" in error_str ): exception_mapping_worked = True raise ContextWindowExceededError( message=f"SagemakerException - {error_str}", model=model, llm_provider="sagemaker", response=original_exception.response, ) elif custom_llm_provider == "vertex_ai": if ( "Vertex AI API has not been used in project" in error_str or "Unable to find your project" in error_str ): exception_mapping_worked = True raise BadRequestError( message=f"VertexAIException - {error_str}", model=model, llm_provider="vertex_ai", response=original_exception.response, ) elif "403" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"VertexAIException - {error_str}", model=model, llm_provider="vertex_ai", response=original_exception.response, ) elif "The response was blocked." in error_str: exception_mapping_worked = True raise UnprocessableEntityError( message=f"VertexAIException - {error_str}", model=model, llm_provider="vertex_ai", response=original_exception.response, ) if hasattr(original_exception, "status_code"): if original_exception.status_code == 400: exception_mapping_worked = True raise BadRequestError( message=f"VertexAIException - {error_str}", model=model, llm_provider="vertex_ai", response=original_exception.response, ) if original_exception.status_code == 500: exception_mapping_worked = True raise APIError( message=f"VertexAIException - {error_str}", status_code=500, model=model, llm_provider="vertex_ai", request=original_exception.request, ) elif custom_llm_provider == "palm": if "503 Getting metadata" in error_str: # auth errors look like this # 503 Getting metadata from plugin failed with error: Reauthentication is needed. Please run `gcloud auth application-default login` to reauthenticate. exception_mapping_worked = True raise BadRequestError( message=f"PalmException - Invalid api key", model=model, llm_provider="palm", response=original_exception.response, ) if "400 Request payload size exceeds" in error_str: exception_mapping_worked = True raise ContextWindowExceededError( message=f"PalmException - {error_str}", model=model, llm_provider="palm", response=original_exception.response, ) if hasattr(original_exception, "status_code"): if original_exception.status_code == 400: exception_mapping_worked = True raise BadRequestError( message=f"PalmException - {error_str}", model=model, llm_provider="palm", response=original_exception.response, ) # Dailed: Error occurred: 400 Request payload size exceeds the limit: 20000 bytes elif custom_llm_provider == "cloudflare": if "Authentication error" in error_str: exception_mapping_worked = True raise AuthenticationError( message=f"Cloudflare Exception - {original_exception.message}", llm_provider="cloudflare", model=model, response=original_exception.response, ) if "must have required property" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"Cloudflare Exception - {original_exception.message}", llm_provider="cloudflare", model=model, response=original_exception.response, ) elif custom_llm_provider == "cohere": # Cohere if ( "invalid api token" in error_str or "No API key provided." in error_str ): exception_mapping_worked = True raise AuthenticationError( message=f"CohereException - {original_exception.message}", llm_provider="cohere", model=model, response=original_exception.response, ) elif "too many tokens" in error_str: exception_mapping_worked = True raise ContextWindowExceededError( message=f"CohereException - {original_exception.message}", model=model, llm_provider="cohere", response=original_exception.response, ) elif hasattr(original_exception, "status_code"): if ( original_exception.status_code == 400 or original_exception.status_code == 498 ): exception_mapping_worked = True raise BadRequestError( message=f"CohereException - {original_exception.message}", llm_provider="cohere", model=model, response=original_exception.response, ) elif original_exception.status_code == 500: exception_mapping_worked = True raise ServiceUnavailableError( message=f"CohereException - {original_exception.message}", llm_provider="cohere", model=model, response=original_exception.response, ) elif ( "CohereConnectionError" in exception_type ): # cohere seems to fire these errors when we load test it (1k+ messages / min) exception_mapping_worked = True raise RateLimitError( message=f"CohereException - {original_exception.message}", llm_provider="cohere", model=model, response=original_exception.response, ) elif "invalid type:" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"CohereException - {original_exception.message}", llm_provider="cohere", model=model, response=original_exception.response, ) elif "Unexpected server error" in error_str: exception_mapping_worked = True raise ServiceUnavailableError( message=f"CohereException - {original_exception.message}", llm_provider="cohere", model=model, response=original_exception.response, ) else: if hasattr(original_exception, "status_code"): exception_mapping_worked = True raise APIError( status_code=original_exception.status_code, message=f"CohereException - {original_exception.message}", llm_provider="cohere", model=model, request=original_exception.request, ) raise original_exception elif custom_llm_provider == "huggingface": if "length limit exceeded" in error_str: exception_mapping_worked = True raise ContextWindowExceededError( message=error_str, model=model, llm_provider="huggingface", response=original_exception.response, ) elif "A valid user token is required" in error_str: exception_mapping_worked = True raise BadRequestError( message=error_str, llm_provider="huggingface", model=model, response=original_exception.response, ) if hasattr(original_exception, "status_code"): if original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( message=f"HuggingfaceException - {original_exception.message}", llm_provider="huggingface", model=model, response=original_exception.response, ) elif original_exception.status_code == 400: exception_mapping_worked = True raise BadRequestError( message=f"HuggingfaceException - {original_exception.message}", model=model, llm_provider="huggingface", response=original_exception.response, ) elif original_exception.status_code == 408: exception_mapping_worked = True raise Timeout( message=f"HuggingfaceException - {original_exception.message}", model=model, llm_provider="huggingface", request=original_exception.request, ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( message=f"HuggingfaceException - {original_exception.message}", llm_provider="huggingface", model=model, response=original_exception.response, ) else: exception_mapping_worked = True raise APIError( status_code=original_exception.status_code, message=f"HuggingfaceException - {original_exception.message}", llm_provider="huggingface", model=model, request=original_exception.request, ) elif custom_llm_provider == "ai21": if hasattr(original_exception, "message"): if "Prompt has too many tokens" in original_exception.message: exception_mapping_worked = True raise ContextWindowExceededError( message=f"AI21Exception - {original_exception.message}", model=model, llm_provider="ai21", response=original_exception.response, ) if "Bad or missing API token." in original_exception.message: exception_mapping_worked = True raise BadRequestError( message=f"AI21Exception - {original_exception.message}", model=model, llm_provider="ai21", response=original_exception.response, ) if hasattr(original_exception, "status_code"): if original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( message=f"AI21Exception - {original_exception.message}", llm_provider="ai21", model=model, response=original_exception.response, ) elif original_exception.status_code == 408: exception_mapping_worked = True raise Timeout( message=f"AI21Exception - {original_exception.message}", model=model, llm_provider="ai21", request=original_exception.request, ) if original_exception.status_code == 422: exception_mapping_worked = True raise BadRequestError( message=f"AI21Exception - {original_exception.message}", model=model, llm_provider="ai21", response=original_exception.response, ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( message=f"AI21Exception - {original_exception.message}", llm_provider="ai21", model=model, response=original_exception.response, ) else: exception_mapping_worked = True raise APIError( status_code=original_exception.status_code, message=f"AI21Exception - {original_exception.message}", llm_provider="ai21", model=model, request=original_exception.request, ) elif custom_llm_provider == "nlp_cloud": if "detail" in error_str: if "Input text length should not exceed" in error_str: exception_mapping_worked = True raise ContextWindowExceededError( message=f"NLPCloudException - {error_str}", model=model, llm_provider="nlp_cloud", response=original_exception.response, ) elif "value is not a valid" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"NLPCloudException - {error_str}", model=model, llm_provider="nlp_cloud", response=original_exception.response, ) else: exception_mapping_worked = True raise APIError( status_code=500, message=f"NLPCloudException - {error_str}", model=model, llm_provider="nlp_cloud", request=original_exception.request, ) if hasattr( original_exception, "status_code" ): # https://docs.nlpcloud.com/?shell#errors if ( original_exception.status_code == 400 or original_exception.status_code == 406 or original_exception.status_code == 413 or original_exception.status_code == 422 ): exception_mapping_worked = True raise BadRequestError( message=f"NLPCloudException - {original_exception.message}", llm_provider="nlp_cloud", model=model, response=original_exception.response, ) elif ( original_exception.status_code == 401 or original_exception.status_code == 403 ): exception_mapping_worked = True raise AuthenticationError( message=f"NLPCloudException - {original_exception.message}", llm_provider="nlp_cloud", model=model, response=original_exception.response, ) elif ( original_exception.status_code == 522 or original_exception.status_code == 524 ): exception_mapping_worked = True raise Timeout( message=f"NLPCloudException - {original_exception.message}", model=model, llm_provider="nlp_cloud", request=original_exception.request, ) elif ( original_exception.status_code == 429 or original_exception.status_code == 402 ): exception_mapping_worked = True raise RateLimitError( message=f"NLPCloudException - {original_exception.message}", llm_provider="nlp_cloud", model=model, response=original_exception.response, ) elif ( original_exception.status_code == 500 or original_exception.status_code == 503 ): exception_mapping_worked = True raise APIError( status_code=original_exception.status_code, message=f"NLPCloudException - {original_exception.message}", llm_provider="nlp_cloud", model=model, request=original_exception.request, ) elif ( original_exception.status_code == 504 or original_exception.status_code == 520 ): exception_mapping_worked = True raise ServiceUnavailableError( message=f"NLPCloudException - {original_exception.message}", model=model, llm_provider="nlp_cloud", response=original_exception.response, ) else: exception_mapping_worked = True raise APIError( status_code=original_exception.status_code, message=f"NLPCloudException - {original_exception.message}", llm_provider="nlp_cloud", model=model, request=original_exception.request, ) elif custom_llm_provider == "together_ai": import json try: error_response = json.loads(error_str) except: error_response = {"error": error_str} if ( "error" in error_response and "`inputs` tokens + `max_new_tokens` must be <=" in error_response["error"] ): exception_mapping_worked = True raise ContextWindowExceededError( message=f"TogetherAIException - {error_response['error']}", model=model, llm_provider="together_ai", response=original_exception.response, ) elif ( "error" in error_response and "invalid private key" in error_response["error"] ): exception_mapping_worked = True raise AuthenticationError( message=f"TogetherAIException - {error_response['error']}", llm_provider="together_ai", model=model, response=original_exception.response, ) elif ( "error" in error_response and "INVALID_ARGUMENT" in error_response["error"] ): exception_mapping_worked = True raise BadRequestError( message=f"TogetherAIException - {error_response['error']}", model=model, llm_provider="together_ai", response=original_exception.response, ) elif ( "error" in error_response and "API key doesn't match expected format." in error_response["error"] ): exception_mapping_worked = True raise BadRequestError( message=f"TogetherAIException - {error_response['error']}", model=model, llm_provider="together_ai", response=original_exception.response, ) elif ( "error_type" in error_response and error_response["error_type"] == "validation" ): exception_mapping_worked = True raise BadRequestError( message=f"TogetherAIException - {error_response['error']}", model=model, llm_provider="together_ai", response=original_exception.response, ) if hasattr(original_exception, "status_code"): if original_exception.status_code == 408: exception_mapping_worked = True raise Timeout( message=f"TogetherAIException - {original_exception.message}", model=model, llm_provider="together_ai", request=original_exception.request, ) elif original_exception.status_code == 422: exception_mapping_worked = True raise BadRequestError( message=f"TogetherAIException - {error_response['error']}", model=model, llm_provider="together_ai", response=original_exception.response, ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( message=f"TogetherAIException - {original_exception.message}", llm_provider="together_ai", model=model, response=original_exception.response, ) elif original_exception.status_code == 524: exception_mapping_worked = True raise Timeout( message=f"TogetherAIException - {original_exception.message}", llm_provider="together_ai", model=model, ) else: exception_mapping_worked = True raise APIError( status_code=original_exception.status_code, message=f"TogetherAIException - {original_exception.message}", llm_provider="together_ai", model=model, request=original_exception.request, ) elif custom_llm_provider == "aleph_alpha": if ( "This is longer than the model's maximum context length" in error_str ): exception_mapping_worked = True raise ContextWindowExceededError( message=f"AlephAlphaException - {original_exception.message}", llm_provider="aleph_alpha", model=model, response=original_exception.response, ) elif "InvalidToken" in error_str or "No token provided" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"AlephAlphaException - {original_exception.message}", llm_provider="aleph_alpha", model=model, response=original_exception.response, ) elif hasattr(original_exception, "status_code"): print_verbose(f"status code: {original_exception.status_code}") if original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( message=f"AlephAlphaException - {original_exception.message}", llm_provider="aleph_alpha", model=model, ) elif original_exception.status_code == 400: exception_mapping_worked = True raise BadRequestError( message=f"AlephAlphaException - {original_exception.message}", llm_provider="aleph_alpha", model=model, response=original_exception.response, ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( message=f"AlephAlphaException - {original_exception.message}", llm_provider="aleph_alpha", model=model, response=original_exception.response, ) elif original_exception.status_code == 500: exception_mapping_worked = True raise ServiceUnavailableError( message=f"AlephAlphaException - {original_exception.message}", llm_provider="aleph_alpha", model=model, response=original_exception.response, ) raise original_exception raise original_exception elif ( custom_llm_provider == "ollama" or custom_llm_provider == "ollama_chat" ): if isinstance(original_exception, dict): error_str = original_exception.get("error", "") else: error_str = str(original_exception) if "no such file or directory" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"OllamaException: Invalid Model/Model not loaded - {original_exception}", model=model, llm_provider="ollama", response=original_exception.response, ) elif "Failed to establish a new connection" in error_str: exception_mapping_worked = True raise ServiceUnavailableError( message=f"OllamaException: {original_exception}", llm_provider="ollama", model=model, response=original_exception.response, ) elif "Invalid response object from API" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"OllamaException: {original_exception}", llm_provider="ollama", model=model, response=original_exception.response, ) elif "Read timed out" in error_str: exception_mapping_worked = True raise Timeout( message=f"OllamaException: {original_exception}", llm_provider="ollama", model=model, ) elif custom_llm_provider == "vllm": if hasattr(original_exception, "status_code"): if original_exception.status_code == 0: exception_mapping_worked = True raise APIConnectionError( message=f"VLLMException - {original_exception.message}", llm_provider="vllm", model=model, request=original_exception.request, ) elif custom_llm_provider == "azure": if "This model's maximum context length is" in error_str: exception_mapping_worked = True raise ContextWindowExceededError( message=f"AzureException - {original_exception.message}", llm_provider="azure", model=model, response=original_exception.response, ) elif "DeploymentNotFound" in error_str: exception_mapping_worked = True raise NotFoundError( message=f"AzureException - {original_exception.message}", llm_provider="azure", model=model, response=original_exception.response, ) elif ( "invalid_request_error" in error_str and "content_policy_violation" in error_str ): exception_mapping_worked = True raise ContentPolicyViolationError( message=f"AzureException - {original_exception.message}", llm_provider="azure", model=model, response=original_exception.response, ) elif "invalid_request_error" in error_str: exception_mapping_worked = True raise BadRequestError( message=f"AzureException - {original_exception.message}", llm_provider="azure", model=model, response=original_exception.response, ) elif hasattr(original_exception, "status_code"): exception_mapping_worked = True if original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( message=f"AzureException - {original_exception.message}", llm_provider="azure", model=model, response=original_exception.response, ) elif original_exception.status_code == 408: exception_mapping_worked = True raise Timeout( message=f"AzureException - {original_exception.message}", model=model, llm_provider="azure", request=original_exception.request, ) if original_exception.status_code == 422: exception_mapping_worked = True raise BadRequestError( message=f"AzureException - {original_exception.message}", model=model, llm_provider="azure", response=original_exception.response, ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( message=f"AzureException - {original_exception.message}", model=model, llm_provider="azure", response=original_exception.response, ) else: exception_mapping_worked = True raise APIError( status_code=original_exception.status_code, message=f"AzureException - {original_exception.message}", llm_provider="azure", model=model, request=original_exception.request, ) else: # if no status code then it is an APIConnectionError: https://github.com/openai/openai-python#handling-errors raise APIConnectionError( __cause__=original_exception.__cause__, llm_provider="azure", model=model, request=original_exception.request, ) if ( "BadRequestError.__init__() missing 1 required positional argument: 'param'" in str(original_exception) ): # deal with edge-case invalid request error bug in openai-python sdk exception_mapping_worked = True raise BadRequestError( message=f"OpenAIException: This can happen due to missing AZURE_API_VERSION: {str(original_exception)}", model=model, llm_provider=custom_llm_provider, response=original_exception.response, ) else: # ensure generic errors always return APIConnectionError= exception_mapping_worked = True if hasattr(original_exception, "request"): raise APIConnectionError( message=f"{str(original_exception)}", llm_provider=custom_llm_provider, model=model, request=original_exception.request, ) else: > raise APIConnectionError( message=f"{str(original_exception)}", llm_provider=custom_llm_provider, model=model, request=httpx.Request( method="POST", url="https://api.openai.com/v1/" ), # stub the request ) E litellm.exceptions.APIConnectionError: 'list' object has no attribute 'replace' ../../../../Library/Caches/pypoetry/virtualenvs/PROJECT-py3.10/lib/python3.10/site-packages/litellm/utils.py:6724: APIConnectionError ```
ishaan-jaff commented 7 months ago

Thanks for this detailed issue, really appreciate the unit test too! will work on this today

ishaan-jaff commented 7 months ago

PR https://github.com/BerriAI/litellm/pull/1685

rmann-nflx commented 7 months ago

Impressive turn around! Thanks for the very quick response.

ishaan-jaff commented 7 months ago

@rmann-nflx are you looking into using Titan Multimodal embeddings ?

rmann-nflx commented 7 months ago

It's not on our road map at the moment as there have been no internal requests for supporting 3rd party image embeddings. However, internally I have been cautious to not make any one-way-door decisions that would block supporting it in the future however.