Closed rmann-nflx closed 10 months ago
Thanks for this detailed issue, really appreciate the unit test too! will work on this today
Impressive turn around! Thanks for the very quick response.
@rmann-nflx are you looking into using Titan Multimodal embeddings ?
It's not on our road map at the moment as there have been no internal requests for supporting 3rd party image embeddings. However, internally I have been cautious to not make any one-way-door decisions that would block supporting it in the future however.
What happened?
Hi Team,
Love the project! First time contributing.
I found a minor inconsistency in how input is handled across the OpenAI and AWS Embeddings API which results in a incorrect an APIConnectionError being thrown instead of a BadRequestError.
In this report I provide some context on the API and the usage of it, provided a stack trace, provided code to reproduce the bug, and included my idea for a fix. With the maintainers approval I can implement my suggested fix.
Background
The OpenAI Embeddings API accepts as input: str | List[str] | List[int] | List[List[int]].
However, the Bedrock Embeddings API for
amazon.titan-embed-text-v1
accepts onlystr | List[str]
.Problem
Passing input=List[int] into
embedding(model='$OPENAI_MODEL')
is valid, but when passed intoembedding(model='amazon.titan-embed-text-v1')
results in:Passing input=List[List[int]] into
embedding(model='$OPENAI_MODEL')
is valid, but when passed intoembedding(model='amazon.titan-embed-text-v1')
results in:(Stack trace included in relevant log output).
Following the Stacktrace shows that base exception comes from input preprocessing at llms/bedrock.py#L716. This line assumes that the input for Amazon LLM's is type str.
I think this should be a BadRequestError instead of a APIConnectionError.
Suggested Resolution: Update Bedrock Embedding API to fail-fast on invalid input type
My proposed change would be add a line around litellm/llms/bedrock.py#L810 to verify that it is iterating over List[str] and if the iterated member is not instanceof str then raise an BadRequestError with message 'Bedrock Embedding API input must be type str | List[str]'.
This change will work for the 3 models that LiteLLM has documented as supporting:
Titan Embeddings - G1
,Cohere Embeddings - English
, andCohere Embeddings - Multilingual
. Based on the provided examples in the AWS Console all 3 of these models only support text input.First thing to consider: Bedrock also a multimodal model named
Titan Multimodal Embeddings Generation 1
. Which has as input:This API isn't supported by LiteLLM, and OpenAI doesn't offer multimodal embeddings, but if OpenAI does offer multimodal embeddings in the future then the above added typecheck would need to be removed.
I think this is acceptable risk as the type check can be updated to confirm the iterated item confirms to the new required type signature.
Second thing to consider: Could there be a wider type checking system in place to handle future API divergences? I think this is a good question but it's a decision which can come later as the above code does not create any one way doors.
If you agree with this approach please confirm and I can implement the fix and send a PR
Reproducible unit test
Running
litellm-1.17.16
Relevant log output
No response
Twitter / LinkedIn details
No response