β the Content filtering and Prompt shielding labs.
β the Model routing lab with OpenAI model based routing.
β the Prompt flow lab to try the Azure AI Studio Prompt Flow with Azure API Management.
β priority
and weight
parameters to the Backend pool load balancing lab.
β the Streaming tool to test OpenAI streaming with Azure API Management.
β the Tracing tool to debug and troubleshoot OpenAI APIs using Azure API Management tracing capability.
β image processing to the GPT-4o inferencing lab.
β the Function calling lab with a sample API on Azure Functions.
The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.
AI services are predominantly accessed via APIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption of AI services.
With the expanding horizons of AI services and their seamless integration with APIs, there is a considerable demand for a comprehensive AI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of the AI Gateway provides a framework for the confident deployment of Intelligent Apps into production.
This repo explores the AI Gateway pattern through a series of experimental labs. The GenAI Gateway capabilities of Azure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is on Azure OpenAI, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any LLM.
Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, Bicep files and Azure API Management policies:
π§ͺ Backend pool load balancing (built-in) | π§ͺ Advanced load balancing (custom) |
Playground to try the built-in load balancing backend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers. | Playground to try the advanced load balancing (based on a custom Azure API Management policy) to either a list of Azure OpenAI endpoints or mock servers. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Access controlling | π§ͺ Token rate limiting |
Playground to try the OAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client. | Playground to try the token rate limiting policy to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Token metrics emitting | π§ͺ Semantic caching |
Playground to try the emit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs. | Playground to try the semantic caching policy. Uses vector proximity of the prompt to previous requests and a specified similarity score threshold. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Response streaming | π§ͺ Vector searching |
Playground to try response streaming with Azure API Management and Azure OpenAI endpoints to explore the advantages and shortcomings associated with streaming. | Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Built-in logging | π§ͺ SLM self-hosting (phy-3) |
Playground to try the buil-in logging capabilities of Azure API Management. Logs requests into App Insights to track details and token usage. | Playground to try the self-hosted phy-3 Small Language Model (SLM) trough the Azure API Management self-hosted gateway with OpenAI API compatibility. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ GPT-4o inferencing | π§ͺ Message storing |
Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats. | Playground to test storing message details into Cosmos DB through the Log to event hub policy. With the policy we can control which data will be stored in the DB (prompt, completion, model, region, tokens etc.). |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Developer tooling (WIP) | π§ͺ Function calling |
Playground to try the developer tooling available with Azure API Management to develop, debug, test and publish AI Service APIs. | Playground to try the OpenAI function calling feature with an Azure Functions API that is also managed by Azure API Management. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Model Routing | π§ͺ Prompt flow |
Playground to try routing to a backend based on Azure OpenAI model and version. | Playground to try the Azure AI Studio Prompt Flow with Azure API Management. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Content Filtering | π§ͺ Prompt Shielding |
Playground to try integrating Azure API Management with Azure AI Content Safety to filter potentially offensive, risky, or undesirable content. | Playground to try Prompt Shields from Azure AI Content Safety service that analyzes LLM inputs and detects User Prompt attacks and Document attacks, which are two common types of adversarial inputs. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
[!TIP] Kindly use the feedback discussion so that we can continuously improve with your experiences, suggestions, ideas or lab requests.
[!NOTE] πͺ² Please feel free to open a new issue if you find something that should be fixed or enhanced.
The Azure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.
Lab | Security | Reliability | Performance | Operations | Costs |
---|---|---|---|---|---|
Request forwarding | β | ||||
Backend circuit breaking | β | β | |||
Backend pool load balancing | β | β | β | ||
Advanced load balancing | β | β | β | ||
Response streaming | β | β | |||
Vector searching | β | β | β | ||
Built-in logging | β | β | β | β | β |
SLM self-hosting | β | β |
[!TIP] Check the Azure Well-Architected Framework perspective on Azure OpenAI Service for aditional guidance.
[!TIP] Install the VS Code Reveal extension, open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code. Or just open the AI-GATEWAY.pptx for a plain old PowerPoint experience.
Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.
We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.
[!IMPORTANT] This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.