Open chancez opened 1 year ago
I agree timeouts are super important, and am in the process of working on a GEP (GEP-1742, kubernetes-sigs/gateway-api#1742, kubernetes-sigs/gateway-api#1744 is the PR to add the actual design.)
Once I've got that a bit further along (I'm hoping to define a request timeout, analogous to the nginx proxy-send-timeout
value, and an idle timeout, probably for the whole connection), then I think we can look at adding timeouts to Ingress that use the same concepts.
For reference, (and because I'm proud of my diagram), here's all the configurable Envoy timeouts we have available, on a standard HTTP request flow, where CM is the Connection Manager level settings, R is the Route level settings, and Cluster is the cluster level settings:
sequenceDiagram
participant C as Client
participant P as Envoy
participant U as Upstream
C->>P: Connection Started
activate P
Note left of P: transport_socket_connect_timeout for TLS
deactivate P
C->>P: Starts sending Request
activate C
activate P
activate P
C->>P: Finishes Headers
note left of P: CM request_headers_timeout
C->>P: Finishes request
deactivate P
activate U
note left of U: Cluster connect_timeout
deactivate U
P->>U: Connection Started
activate U
note right of U: CM idle_timeout<br />CM max_connection_duration
P->>U: Starts sending Request
P->>U: Finishes Headers
note left of P: CM request_timeout
P->>U: Finishes request
deactivate P
activate U
U->>P: Starts Response
U->>P: Finishes Headers
note right of U: R timeout<br/>R per_try_timeout<br/>R per_try_idle_timeout
U->>P: Finishes Response
deactivate U
P->>C: Starts Response
P->>C: Finishes Headers
P->>C: Finishes Response
Note left of C: CM stream_idle_timeout<br />R idle_timeout<br />CM,R max_stream_duration<br/>TCP proxy idle_timeout<br />TCP protocol idle_timeout
deactivate C
Note right of P: Repeat if connection sharing
U->>C: Connection ended
deactivate U
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
The upstream effort is still ongoing here, but this is definitely still relevant.
Hey - is there an east-west max_connection_duration
config per service/route?
i saw it can be configured globally, but couldn't find any example to use Envoy CRDs to set it up
Not currently. Once we support GAMMA (see #22512 for that work), that will include the request timeout, which functions as a max connection duration.
Actually it's achievable with envoyConfig
as well
example: (this is based on the circuit break example, just added timeout on the route)
Can probably be configured in a different way to achieve exactly same functionially as max connection duration
apiVersion: cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
name: envoy-circuit-breaker
spec:
services:
- name: cilium-target
namespace: default
resources:
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
name: envoy-lb-listener
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: envoy-lb-listener
rds:
route_config_name: lb_route
use_remote_address: true
skip_xff_append: true
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
- "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
name: lb_route
virtual_hosts:
- name: "lb_route"
domains: [ "*" ]
routes:
- match:
prefix: "/"
route:
timeout: 30s
weighted_clusters:
clusters:
- name: "default/cilium-target"
weight: 100
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: "default/cilium-target"
connect_timeout: 5s
lb_policy: ROUND_ROBIN
type: EDS
circuit_breakers:
thresholds:
- priority: "DEFAULT"
max_requests: 2
max_pending_requests: 1
outlier_detection:
split_external_local_origin_errors: true
consecutive_local_origin_failure: 2
Cilium Feature Proposal
Is your feature request related to a problem?
Currently I'm using nginx-ingress and I when I want to expose hubble-relay outside the cluster, I need to configure the Nginx
grpc_read_timeout
option in order to avoidhubble observe --follow
from disconnecting when there are no flows being transmitted. The same is true if I was writing data to a gRPC connection (I don't have an example for this to point at).Another similar example is for websockets. Some internal web applications I'm using maintain long lived websocket connections where no data is transmitted for a long period of time. In these cases, if no data is transmitted for long enough, the connection gets closed. With nginx-ingress I can adjust the read/write timeouts to handle this better by using the annotations
nginx.ingress.kubernetes.io/proxy-read-timeout
andnginx.ingress.kubernetes.io/proxy-send-timeout
, which map to theproxy_read_timeout
andproxy_send_timeout
options in upstream Nginx.Describe the feature you'd like
Currently Cilium does not support exposing gRPC services via Ingress, and needs #21928 for gRPC support in GatewayAPI, so this is potentially preemptive.
I'd like the ability to tune the "read/write" timeouts of an HTTP or gRPC request. I know that envoy uses different settings, and breaks things down a bit more, but being able to tune these options is important for connections that aren't always transmitting data.