envoyproxy / gateway

Manages Envoy Proxy as a Standalone or Kubernetes-based Application Gateway
https://gateway.envoyproxy.io
Apache License 2.0
1.46k stars 300 forks source link

Support per route idle timeout in BackendTrafficPolicy #2611

Open arkodg opened 5 months ago

arkodg commented 5 months ago

My use-case isn't specific to http2, rather it's requests that stream data over long-lived http1 requests. After a lengthy internal timeout the service responds with 200 and the frontend repeats the request reopening the stream. Sometimes these actively send data back intermittently and other times they do nothing until the internal timeout is reached and the request is repeated.

We had already set timeouts.request on the HTTPRoute set to something crazy high like 900s, but the default value for stream_idle_timeout in the connection manager is 300s which results in a 504 with reason of stream_idle_timeout in the logs. Currently this is being patched into the connection manager settings to push this higher across the board so that high timeouts on the HTTPRoute will work right, but I'd like to only adjust it as needed on routes with timeouts >300s and/or those streaming data.

Initially I didn't think of this, but we could set idle_timeout on the route config by default to match the value given in timeouts.request on the HTTPRoute. Or perhaps to timeouts.request + 30 seconds so as to not interfere with the regular request timeout leaving most users with a single knob to configure on the HTTPRoute.

Having the idle timeout knob available on the BackendTrafficPolicy would still be useful for http2 use-cases and for controlling allowed idle periods on streaming http1 requests. Ex: request timeout of 900s, idle timeout of 600s to timeout the request if it stops streaming data for an extended period of time.

Originally posted by @davidalger in https://github.com/envoyproxy/gateway/issues/2609#issuecomment-1944460921

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days.