Open Augustyniak opened 4 years ago
Another example when being able to run filters for every individual upstream request that Envoy makes would be helpful.
At Lyft, we send x-timestamp-ms
HTTP header representing the current client NTP timestamp as part of every upstream request our application makes. With our legacy HTTP stack, we are able to update the value of this header for every upstream request we make, with EnvoyMobile
stack we can only set it once for a given downstream request and EnvoyMobile
reuses this value for every upstream request associated with a downstream request we started.
With our default configuration, we wait for up to 15 seconds for every upstream request to finish before it timeouts and we attempt to perform it again. For requests that support multiple retries we need to keep updating the value of x-timestamp-ms
for every upstream request or it becomes stale.
With some requests, it's totally possible that the number of upstream requests we perform for a given request reaches numbers as high as 20 for when a user has a weak internet connection or there is an outage of one of our services. With these hypothetical 20 retries, we could end up with the value x-timestamp-ms
HTTP header being off by 20 * 15seconds (upstream request timeout) = 300 seconds when we use EnvoyMobile
that doesn't allow us to update the value of x-timestamp-ms
HTTP header of upstream requests it performs.
Proposal
Allow
EnvoyMobile
request filters to be run for each retry request it performs.Introduction
Currently,
EnvoyMobile
's request filter chain is run only once for any network request it performs. This is true even for requests with multiple retries.Let's say that we have a retry policy that allows for up to 3 retries of a request and mark these attempts using
0
,1
,2
,3
numbers. Now, before a request is performedEnvoyMobile
allows us to modify it using registered filters. We can modify the request once before the attempt0
is made and allowEnvoyMobile
to perform attempts1
,2
and3
as needed without being able to modify requests that are made as part of these retries.Issue
At Lyft, we work on extending our mobile fault injection capabilities. For this reason, we work actively on ingesting 'fault injection HTTP headers' into random network requests our mobile applications perform in order to understand the behavior of our apps in degraded server and/or connectivity conditions. Fault injection HTTP headers are just special HTTP headers supported by
Envoy
that's used by Lyft's server infrastructure. They are documented here and they include the following headers:x-envoy-fault-request-abort
,x-envoy-fault-delay-request
andx-envoy-fault-response-limit
.What is explained below is true for all of these headers but let's look at
x-envoy-fault-request-abort
headers specifically because its example outlines the issue we are dealing with the best. Let's say that we have a requestv1/foo
and we want to check how our application behaves in cases where 50% of requests of this type failing with 400 HTTP status code.We can use
EnvoyMobile
filter chain to addx-envoy-fault-abort-request
HTTP header and set its value to400
. The problem is that we cannot specify that these HTTP headers should be added to 50% of outgoing requests only - we can either not add it to a request at all or add it and accept the fact that it's going to be added to the original request and all of its retries.Going back to our example, we want to simulate 50% failure rate with 400 status code for 50% of
v1/foo
network requests and our default retry policy allows for up to 3 retries of any request. With the current capabilities ofEnvoyMobile
we can addx-envoy-fault-abort-request: 400
HTTP header to outgoing network request (with 50% chance of it being added) but in the end, we end up with 4 attempts of this request failing with 400 status code since each of the retries of the request containsx-envoy-fault-abort-request: 400
HTTP header.This makes it impossible for us to test scenarios in which only a portion of attempts of performing a given request fails with a given status code.