Open arkodg opened 1 month ago
Hoping end users as well as vendors using Envoy Gateway today can chime in and share whether they are interested in using this feature if it did exist natively.
Please also leave a comment, If you're not yet an Envoy Gateway user, but would adopt it, if this feature was added 😄
Current workaround
original_destination_cluster
xDS Cluster configSo far, we refrained from supporting specific backends (e.g. S3, EC2, ... ). This API is not yet a widely adopted resource like Service
, ServiceImport
.
Create an EnvoyExtensionPolicyto configure the ext proc service
The alternative, as I understand it, is to have a backend resource define portions of the downstream filter chain. In general (not for the LLM use case), that can create some issues around unexpected side effects and conflicts from different backends. Maybe this can be mitigated by scoping the filters to specific routes or even using upstream filters and by detecting/resolving conflicts in IR translation.
Edit xDS to add the using EnvoyPatchPolicy or Extension Server to add the original_destination_cluster xDS Cluster config
This can be improved (somewhat) by supporting backend reference extensibility, as proposed here: https://github.com/envoyproxy/gateway/issues/4373#issuecomment-2386691503.
EG can't directly support LLMServerPool
as a Backend type because it lacks the logic to handle LLM-specific configuraitons, such as how to set up the filter chain and routes properly. This responsibility falls to a standalone component, the "LLM Gateway controller".
The current workaround, using a dummy backend approach, is a bit of a hack. It results in an HTTPRoute that can be confusing to anyone inspecting it, as the destination cluster is just a placeholder. This can be improved by adding support to custom Backend types, as @guydc suggested.
EG will need to invoke an "LLM Gateway extension" to translate the llm-backend
to a original_destination_cluster
. This extension will also insert an ExtProc filter to the HTTP filter chain to retrieve the IP of the LLM pod, this can be added via an EnvoyExtensionPolicy
or through a xDS mutation extension point like the Extension Server.
EG delegates the translation of llm-gateway.k8s.io\LLMServerPool
Backend type to a third-party extension.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: llm-route
spec:
parentRefs:
- name: inference-gateway
sectionName: llm-gw
rules:
- backendRefs:
- group: llm-gateway.k8s.io
kind: LLMServerPool
name: llm-backend
This Backend resource is only used by the LLM Gateway controller, EG doesn't care about it.
apiVersion: llm-gateway.k8s.io
kind: LLMServerPool
metadata:
name: llm-backend
spec:
.... omitted, EG doesn't care
This mechanism can also be used to support other vendor-specific or private Backend types as out-of-tree extensions, such as AWS S3, EC2, Lambda, etc.
To clarify, we're checking with Envoy-based Gateway API implementations to understand which ones would be open to adding native support for the new LLMServerPool API that wg-serving is working on.
This API is not yet a widely adopted resource like Service, ServiceImport.
Completely agree. This is a bit of a chicken and egg problem though. We want to see Gateway API implementations support this new k8s API as a backend, but that requires one implementation to be first. Ideally that's an OSS implementation that can then be used as a reference implementation for how this integration can work.
EG delegates the translation of llm-gateway.k8s.io\LLMServerPool Backend type to a third-party extension.
The point here is that this is a new Kubernetes API, not a third-party extension. Deciding on whether or not to support this should be more related to whether or not this project should support TLSRoute or ServiceImport - OSS Kubernetes APIs that are still only in alpha.
I've suggested that instead of continuing to work on the rather fragile workaround in https://github.com/envoyproxy/gateway/issues/4423#issuecomment-2406261174, it would be better for the WG to work to support this resource natively in an OSS + CNCF Gateway API implementation. Envoy Gateway seems like a great option for this, but we'll also be open to any other projects that are interested.
@robscott Thanks for the clarification! I initially thought this was being proposed as an EG-specific API. If it's going to be a Kubernetes API like TCPRoute
, then EG would be happy to support it. EG has already supported all the experimental Gateway APIs, so supporting this API would be in line with that.
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
Description:
The kubernetes-sigs/llm-instance-gateway project has introduced a new backendRef called LLMServerPool, representing a collection of model servers inside Kubernetes, that can be routed to, from an HTTPRoute, and is looking for envoy proxy based implementations to support routing to this backendRef natively. More in https://github.com/kubernetes-sigs/llm-instance-gateway/issues/19
Creating this issue, to decide on whether Envoy Gateway should add support for this