BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
11.66k stars 1.34k forks source link

[Feature]: Support Vertex Multimodal embeddings #4622

Closed 04cfb1ed closed 6 days ago

04cfb1ed commented 1 month ago

The Feature

Vertex Multimodal embeddings allows sending image, text and video

It can be used through REST endpoint or Vertex AI SDK in Python

https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings#aiplatform_sdk_text_image_embedding-drest

Sample code from reference

import vertexai
from vertexai.vision_models import Image, MultiModalEmbeddingModel

# TODO(developer): Update values for project_id, image_path & contextual_text
vertexai.init(project=project_id, location="us-central1")

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")
image = Image.load_from_file(image_path)

embeddings = model.get_embeddings(
    image=image,
    contextual_text=contextual_text,
    dimension=dimension,
)
print(f"Image Embedding: {embeddings.image_embedding}")
print(f"Text Embedding: {embeddings.text_embedding}")

Motivation, pitch

Multimodal embeddings enable multimodal processing in RAG combining video, audio or images

Twitter / LinkedIn details

No response

krrishdholakia commented 1 month ago

That's interesting. Happy to add support for this.

krrishdholakia commented 1 month ago

@04cfb1ed can we setup a 1:1 support channel? Noticed you'd had a couple issues, want to prioritize correctly for your use-case

LinkedIn Discord (just 👋 wave on #general and i'll setup a channel)