BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.66k stars 1.47k forks source link

[Feature]: Allow Redis Semantic caching with custom Embedding models #4001

Open anandsriraman opened 4 months ago

anandsriraman commented 4 months ago

The Feature

Currently, I notice that the schema for the redis semantic cache enforces a dimension size of 1536. This works well with OpenAI's text-embedding-ada-002 models but fail for any other model.

schema = {
            "index": {
                "name": "litellm_semantic_cache_index",
                "prefix": "litellm",
                "storage_type": "hash",
            },
            "fields": {
                "text": [{"name": "response"}],
                "text": [{"name": "prompt"}],
                "vector": [
                    {
                        "name": "litellm_embedding",
                        "dims": 1536,
                        "distance_metric": "cosine",
                        "algorithm": "flat",
                        "datatype": "float32",
                    }
                ],
            },
        }

Please make the embedding dims configurable from the model-config.yaml file so that a larger range of embedding models deployed with LiteLLM can be used for the cache.

Motivation, pitch

For caching, much smaller embedding models might be preferred due to their cost and speed. If the dims becomes a configurable item, then this opens up much more interesting modes of caching and cache optimization. It should also reduce costs significantly as this would avoid calls to OpenAI every time a new user query is recieved.

Twitter / LinkedIn details

No response

ishaan-jaff commented 4 months ago

hi @anandsriraman I followed up over Linkedin to better understand what you need. This is my linkedin: https://www.linkedin.com/in/reffajnaahsi/