BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.57k stars 1.59k forks source link

[Feature]: Use stream_options on Azure OpenAI #5751

Closed Manouchehri closed 1 month ago

Manouchehri commented 1 month ago

The Feature

It's undocumented, but 2024-08-01-preview supports "stream_options": {"include_usage": true}}.

Example:

export AZURE_OPENAI_AD_TOKEN=$(az account get-access-token --scope "https://cognitiveservices.azure.com/.default" --query accessToken --output tsv)

curl -v "https://RESOURCE_REMOVED.openai.azure.com/openai/deployments/DEPLOYMENT_REMOVED/chat/completions?api-version=2024-08-01-preview" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AD_TOKEN" \
  -H "x-ms-client-request-id: tests-good-$RANDOM" \
  -d '{"messages":[{"role": "user", "content": "Hello."}], "max_tokens": 1, "temperature": 0.0, "seed": 42, "stream": true, "stream_options": {"include_usage": true}}'
data: {"choices":[],"created":0,"id":"","model":"","object":"","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"jailbreak":{"filtered":false,"detected":false},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}]}

data: {"choices":[{"delta":{"content":"","role":"assistant"},"finish_reason":null,"index":0,"logprobs":null}],"created":1726607660,"id":"chatcmpl-A8ZoadxBE6aOtHs9MeXRg0V4paWLv","model":"gpt-4o-mini","object":"chat.completion.chunk","system_fingerprint":"fp_80a1bad4c7","usage":null}

data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":null,"index":0,"logprobs":null}],"created":1726607660,"id":"chatcmpl-A8ZoadxBE6aOtHs9MeXRg0V4paWLv","model":"gpt-4o-mini","object":"chat.completion.chunk","system_fingerprint":"fp_80a1bad4c7","usage":null}

data: {"choices":[{"delta":{},"finish_reason":"length","index":0,"logprobs":null}],"created":1726607660,"id":"chatcmpl-A8ZoadxBE6aOtHs9MeXRg0V4paWLv","model":"gpt-4o-mini","object":"chat.completion.chunk","system_fingerprint":"fp_80a1bad4c7","usage":null}

data: {"choices":[{"content_filter_offsets":{"check_offset":36,"start_offset":36,"end_offset":41},"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"finish_reason":null,"index":0}],"created":0,"id":"","model":"","object":""}

data: {"choices":[{"content_filter_offsets":{"check_offset":36,"start_offset":36,"end_offset":41},"content_filter_results":{"protected_material_code":{"filtered":false,"detected":false},"protected_material_text":{"filtered":false,"detected":false}},"finish_reason":null,"index":0}],"created":0,"id":"","model":"","object":""}

data: {"choices":[],"created":1726607660,"id":"chatcmpl-A8ZoadxBE6aOtHs9MeXRg0V4paWLv","model":"gpt-4o-mini","object":"chat.completion.chunk","system_fingerprint":"fp_80a1bad4c7","usage":{"completion_tokens":1,"prompt_tokens":9,"total_tokens":10}}

data: [DONE]

Motivation, pitch

Allows getting perfectly accurate token results in streaming responses.

Twitter / LinkedIn details

https://www.linkedin.com/in/davidmanouchehri/

Manouchehri commented 1 month ago

Bump on this?

Manouchehri commented 1 month ago

Added in #6024.