Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Instead of returning the response to the user, upload the response quickly to a fast S3 bucket (like GCS or R2), and return a presigned URL to the client. This would only work for non-streaming responses.
The Feature
Instead of returning the response to the user, upload the response quickly to a fast S3 bucket (like GCS or R2), and return a presigned URL to the client. This would only work for non-streaming responses.
This has been supported in OpenAI's python client since: https://github.com/openai/openai-python/pull/1100. Following redirects with
fetch
in JavaScript is a default thing.PoC:
Then when using:
Then this results in a GET request with these headers:
Note to self, do not presigned the GET URL with the
authorization
header.Motivation, pitch
For large responses:
Twitter / LinkedIn details
https://www.linkedin.com/in/davidmanouchehri/