Arize-ai / openinference

OpenTelemetry Instrumentation for AI Observability

Apache License 2.0

220 stars 34 forks source link

GPT 4o introduces a new message type that contains images and coded as either URL or base64 encoded.

example:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

https://platform.openai.com/docs/guides/vision

Milestone 1

Vision support in instrumentations python for llama-index, openai, gemini, and langchain
Eliminate performance degradations from base64 encoded payloads by allowing users to opt out
Preliminary set of config flags to mask input output that could be sensitive info
Create examples

Milestone N

image synthesis apis such as DALL-E

Tracing

[x] #522
[x] #523
[x] #539
[ ] #562
[x] #582
[x] #538
[ ] [vision] [javascript] langchain image messages parsing
[x] #557
[x] #560
[ ] #1081
[x] #567

Instrumenation

Testing

[x] #872
Image tracing
[x] #707
[x] #708
[x] #709
[x] #710
[ ] #711
[x] #631
[x] #712
[x] #713
[x] #714
[x] #715
[x] #716
[x] #717

Context Attributes

[x] #718
[x] #719
[x] #720
[x] #721
[x] #722
[x] #723
[x] #724
[x] #725
[x] #726
[x] #727
[x] #728
[x] #729

Config

[x] #730
[x] #731
[x] #733
[x] #732
[x] #734
[x] #633
[x] #737
[x] #632
[x] #736
[x] #634
[x] #635
[x] #735

Suppress Tracing

[x] #748
[x] #749

UI / Javascript

[x] #568
[x] #704
[x] #821
[x] #956
[ ] [vision] instrumentation for langchain-js
Testing
[ ] #558

Documentation

[x] #561
[x] #786
[x] #787
[ ] #788
[ ] #833

Evals

[ ] #574

class VLMClient: def __init__(self, vlm_model: str = VLM_MODEL, vllm_url: str = VLLM_URL): self._vlm_model = vlm_model self._vllm_client = httpx.AsyncClient(base_url=vllm_url) if VLLM_HEALTHCHECK: wait_for_ready( server_url=vllm_url, wait_seconds=VLLM_READY_TIMEOUT, health_endpoint="health", ) @property def vlm_model(self) -> str: return self._vlm_model async def __call__( self, prompt: str, image_bytes: bytes | None = None, image_filetype: filetype.Type | None = None, max_tokens: int = 10, ) -> str: # Assemble the message content message_content: list[dict[str, str | dict]] = [ { "type": "text", "text": prompt, } ] if image_bytes is not None: if image_filetype is None: image_filetype = filetype.guess(image_bytes) if image_filetype is None: raise ValueError("Could not determine image filetype") if image_filetype not in ALLOWED_IMAGE_TYPES: raise ValueError( f"Image type {image_filetype} is not supported. Allowed types: {ALLOWED_IMAGE_TYPES}" ) image_b64 = base64.b64encode(image_bytes).decode("utf-8") message_content.append( { "type": "image_url", "image_url": { "url": f"data:{image_filetype.mime};base64,{image_b64}", }, } ) # Put together the request payload payload = { "model": self.vlm_model, "messages": [{"role": "user", "content": message_content}], "max_tokens": max_tokens, # "logprobs": True, # "top_logprobs": 1, } response = await self._vllm_client.post("/v1/chat/completions", json=payload) response = response.json() response_text: str = ( response.get("choices")[0].get("message", {}).get("content", "").strip() ) return response_text

Arize-ai / openinference

🗺️ Vision / multi-modal #495

Milestone 1

Milestone N

Tracing

Instrumenation

Testing

Image tracing

Context Attributes

Config

Suppress Tracing

UI / Javascript

Testing

Documentation

Evals