Consider adding support for routing to LLMServerPool as a valid backendRef

arkodg commented 1 month ago

Description:

The kubernetes-sigs/llm-instance-gateway project has introduced a new backendRef called LLMServerPool, representing a collection of model servers inside Kubernetes, that can be routed to, from an HTTPRoute, and is looking for envoy proxy based implementations to support routing to this backendRef natively. More in https://github.com/kubernetes-sigs/llm-instance-gateway/issues/19

Creating this issue, to decide on whether Envoy Gateway should add support for this

arkodg commented 1 month ago

Hoping end users as well as vendors using Envoy Gateway today can chime in and share whether they are interested in using this feature if it did exist natively.

Please also leave a comment, If you're not yet an Envoy Gateway user, but would adopt it, if this feature was added 😄

Current workaround

Create an EnvoyExtensionPolicy to configure the ext proc service
Edit xDS to add the using EnvoyPatchPolicy or Extension Server to add the original_destination_cluster xDS Cluster config

guydc commented 1 month ago

So far, we refrained from supporting specific backends (e.g. S3, EC2, ... ). This API is not yet a widely adopted resource like Service, ServiceImport.

Create an EnvoyExtensionPolicyto configure the ext proc service

The alternative, as I understand it, is to have a backend resource define portions of the downstream filter chain. In general (not for the LLM use case), that can create some issues around unexpected side effects and conflicts from different backends. Maybe this can be mitigated by scoping the filters to specific routes or even using upstream filters and by detecting/resolving conflicts in IR translation.

Would this significantly complicate existing translation in EG?
Are there other examples in EG/GW-API space for backends having this "implict" impact on downstream traffic processing?

Edit xDS to add the using EnvoyPatchPolicy or Extension Server to add the original_destination_cluster xDS Cluster config

This can be improved (somewhat) by supporting backend reference extensibility, as proposed here: https://github.com/envoyproxy/gateway/issues/4373#issuecomment-2386691503.

Users may still reference the LLMServerPool in their HTTPRoutes, but EG is not responsible for the translation.
The extension server required for LLM resource translation may be delivered as part of an extended EG "contrib" chart, to simplify LCM.

zhaohuabing commented 1 month ago

EG can't directly support LLMServerPool as a Backend type because it lacks the logic to handle LLM-specific configuraitons, such as how to set up the filter chain and routes properly. This responsibility falls to a standalone component, the "LLM Gateway controller".

The current workaround, using a dummy backend approach, is a bit of a hack. It results in an HTTPRoute that can be confusing to anyone inspecting it, as the destination cluster is just a placeholder. This can be improved by adding support to custom Backend types, as @guydc suggested.

EG will need to invoke an "LLM Gateway extension" to translate the llm-backend to a original_destination_cluster. This extension will also insert an ExtProc filter to the HTTP filter chain to retrieve the IP of the LLM pod, this can be added via an EnvoyExtensionPolicy or through a xDS mutation extension point like the Extension Server.

EG delegates the translation of llm-gateway.k8s.io\LLMServerPool Backend type to a third-party extension.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: llm-route
spec:
  parentRefs:
    - name: inference-gateway
      sectionName: llm-gw
  rules:
  - backendRefs:
      - group: llm-gateway.k8s.io
        kind: LLMServerPool
        name: llm-backend

This Backend resource is only used by the LLM Gateway controller, EG doesn't care about it.

apiVersion: llm-gateway.k8s.io
kind: LLMServerPool
metadata:
  name: llm-backend
spec:
  .... omitted, EG doesn't care

This mechanism can also be used to support other vendor-specific or private Backend types as out-of-tree extensions, such as AWS S3, EC2, Lambda, etc.

robscott commented 1 month ago

To clarify, we're checking with Envoy-based Gateway API implementations to understand which ones would be open to adding native support for the new LLMServerPool API that wg-serving is working on.

This API is not yet a widely adopted resource like Service, ServiceImport.

Completely agree. This is a bit of a chicken and egg problem though. We want to see Gateway API implementations support this new k8s API as a backend, but that requires one implementation to be first. Ideally that's an OSS implementation that can then be used as a reference implementation for how this integration can work.

EG delegates the translation of llm-gateway.k8s.io\LLMServerPool Backend type to a third-party extension.

The point here is that this is a new Kubernetes API, not a third-party extension. Deciding on whether or not to support this should be more related to whether or not this project should support TLSRoute or ServiceImport - OSS Kubernetes APIs that are still only in alpha.

I've suggested that instead of continuing to work on the rather fragile workaround in https://github.com/envoyproxy/gateway/issues/4423#issuecomment-2406261174, it would be better for the WG to work to support this resource natively in an OSS + CNCF Gateway API implementation. Envoy Gateway seems like a great option for this, but we'll also be open to any other projects that are interested.

zhaohuabing commented 1 month ago

@robscott Thanks for the clarification! I initially thought this was being proposed as an EG-specific API. If it's going to be a Kubernetes API like TCPRoute, then EG would be happy to support it. EG has already supported all the experimental Gateway APIs, so supporting this API would be in line with that.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

envoyproxy / gateway

Consider adding support for routing to LLMServerPool as a valid backendRef #4423