Semantic Caching with Qdrant Vector database

BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

https://docs.litellm.ai/docs/

Other

12.07k stars 1.39k forks source link

Semantic Caching with Qdrant Vector database #4963

Closed haadirakhangi closed 2 weeks ago

haadirakhangi commented 1 month ago

The Feature

Litellm currently supports only Redis for semantic caching. It would be advantageous if it also supported semantic caching with other popular vector databases such as Qdrant.

Motivation, pitch

Currently, LiteLLM supports in-memory cache, S3 bucket cache, Redis cache, and Redis semantic cache. We believe semantic caching plays a crucial role in LLM applications compared to simple caching. We are working on Qdrant, and considering its efficient and rapid responses along with the vector quantization feature it offers, enabling semantic caching with Qdrant as the vector database would be highly beneficial.

https://docs.litellm.ai/docs/proxy/caching

Twitter / LinkedIn details

https://www.linkedin.com/in/haadi-rakhangi/

haadirakhangi commented 1 month ago

@krrishdholakia could you please assign this issue to me if you believe it would be a valuable contribution to the LiteLLM library?

krrishdholakia commented 1 month ago

hey @haadirakhangi could you point me to qdrant's semantic caching documentation?

sumitdas66 commented 1 month ago

@krrishdholakia https://qdrant.tech/blog/semantic-cache-ai-data-retrieval/ https://github.com/infoslack/qdrant-example/blob/main/semantic-cache.ipynb

haadirakhangi commented 1 month ago

I have attached a cookbook for my contribution of semantic caching with qdrant vector database. You can refer to it for the results.

https://colab.research.google.com/drive/1Lew6xi0ACfIigzfnD_ggWsVIXn-PvaBN?usp=sharing

I have created a pull request for the same: #5018 Feel free to give any reviews!

@krrishdholakia

sumitdas66 commented 1 month ago

Hey @haadirakhangi, can you give an example of how we add the settings to enable qdrant semantic cache in the config.yaml file?

haadirakhangi commented 1 month ago

Hi @sumitdas66,

Thank you for your patience. I have tested and confirmed that the existing code supports the necessary settings without any additional changes. Here is an example the config.yaml file with the required settings to enable Qdrant semantic caching:

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
  - model_name: text-embedding-ada-002
    litellm_params:
      model: text-embedding-ada-002

litellm_settings:
  set_verbose: True
  cache: True 
  cache_params:       
    type: qdrant-semantic
    qdrant_url: "YOUR_QDRANT_URL"
    qdrant_username: YOUR_QDRANT_USERNAME
    qdrant_password: YOUR_QDRANT_PASSWORD
    similarity_threshold: 0.8
    qdrant_collection_name: "litellm-testing"

Feel free to replace the placeholder values (YOUR_QDRANT_URL, YOUR_QDRANT_USERNAME, YOUR_QDRANT_PASSWORD) with your actual Qdrant credentials.

Do let me know if you have any further questions!