Successive per_downstream_connection http local_ratelimit filters seem to reduce effective tokens, rate limit too early

bmcalary-atlassian commented 2 years ago

Title: Successive per_downstream_connection http local_ratelimit filters seem to reduce effective tokens, rate limit too early

Description:

On Envoy 1.22.x and 1.21.x, when adding multiple local_ratelimit filters to either the listener or vhost config, or both, with per_downstream_connection=true set, the effective rate is lower than any of the defined token buckets alone.

The expected behavior is that each token bucket be distinct and honored. For example, if a client sends 1000 rps and each request encounters two per_downstream_connection=true local_ratelimit filters, one with 500 tokens per second (max 500) and one with 600 tokens per second (max 600), the expected number of ok requests should be around 500, with 500 receiving 429. However, in practice, the number of successful requests is closer to 200-300, which is unexpected.

Repro steps: Envoy 1.22.x and 1.21.x

Configure a listener, or route/vhost or combination thereof such that a request passes through multiple local_ratelimit filters with per_downstream_connection=true set.
Send increasing rates of requests.
Notice that rate limiting occurs at a much earlier and lower level than expected.

Example load test tool:

echo 'GET https://envoy/200' | vegeta attack -insecure -header 'Host: envoy200ok.com' -connections 1 -http2 -keepalive -max-body -1 -rate 1000/1s -duration 10s -max-workers 1 -workers 1

Config:

The following config should permit around 660-662 requests per second.

filters:

- 
    name: envoy.filters.http.local_ratelimit.per_conn_a
    typed_config:
        '@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
        stat_prefix: per_conn_a
        token_bucket:
            max_tokens: 600
            tokens_per_fill: 60
            fill_interval: 0.100s
        filter_enabled:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enabled_per_conn_a
        filter_enforced:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enforced_per_conn_a
        local_rate_limit_per_downstream_connection: true
- 
    name: envoy.filters.http.local_ratelimit.per_conn_b
    typed_config:
        '@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
        stat_prefix: per_conn_b
        token_bucket:
            max_tokens: 601
            tokens_per_fill: 61
            fill_interval: 0.100s
        filter_enabled:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enabled_per_conn_b
        filter_enforced:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enforced_per_conn_b
        local_rate_limit_per_downstream_connection: true

Demonstation: With the two local rate limits given above, and sending 1000 requests per second for 10 seconds on a single connection: Note how only 3272 requests are able to complete successfully, not the expected 6600. If one of either rate limits is commented out, rate limiting behaves as expected. If local_rate_limit_per_downstream_connection is removed, rate limiting behaves as expected. Vegeta output

"status_codes":{"200":3270,"429":6716},"errors":["429 Too Many Requests"]

Envoy stats

per_conn_a.http_local_rate_limit.enabled: 9988
per_conn_a.http_local_rate_limit.enforced: 6716
per_conn_a.http_local_rate_limit.ok: 3272
per_conn_a.http_local_rate_limit.rate_limited: 6716
per_conn_b.http_local_rate_limit.enabled: 3272
per_conn_b.http_local_rate_limit.enforced: 0
per_conn_b.http_local_rate_limit.ok: 3272
per_conn_b.http_local_rate_limit.rate_limited: 0

If envoy.filters.http.local_ratelimit.per_conn_b is commented out and the loadtest re-performed, the rate limiting behaves as expected: vegeta

"status_codes":{"200":6538,"429":3462},"errors":["429 Too Many Requests"]

envoy stats

main_thread.watchdog_miss: 0
per_conn_a.http_local_rate_limit.enabled: 10002
per_conn_a.http_local_rate_limit.enforced: 3462
per_conn_a.http_local_rate_limit.ok: 6540
per_conn_a.http_local_rate_limit.rate_limited: 3462
runtime.admin_overrides_active: 0

If you add another copy of the per_downstream_connection local rate limit (C with rate 603 per second) the effective rate drops further:

"status_codes":{"200":2180,"429":7820},"errors":["429 Too Many Requests"]

What I have tried:

Using distinct names, runtime_keys, max token/per fill token values, fill intervals, runtime key names
Tried with much smaller and larger values for max_tokens, tokens_per_fill, fill_interval. Seems to happen no matter what these are set to.

What I think is going on: It looks like, for each http_local_rate_limit token bucket configured, each request is taking that many tokens from the first bucket encountered. Then when requests reach the next bucket, they take a number of tokens, equal to the number of filters/buckets, from the first bucket encountered again. So if we have 3 rate limits with local_rate_limit_per_downstream_connection set, 3 tokens per request are being taken from the first bucket encountered. And so on.

bmcalary-atlassian commented 2 years ago

I have also observed that the stats will attribute a rate limit decision to multiple distinct rate limit filters/stat prefixes.

In this image 1173 429's would have been returned t clients, but the statistic is incremented for every single token bucket/filter.

kyessenov commented 2 years ago

There is a blurb about matching on overfill here: https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/local_rate_limit_filter#using-rate-limit-descriptors-for-local-rate-limiting. Is that consistent with what you're seeing?

bmcalary-atlassian commented 2 years ago

No sadly, I know of that behaviour. I don't think that's an explanation because, if it was working correctly, the effective rate would still be the min of one of those rate limits per second I mentioned in my bug description. And I'm not worried about some corner cases where a request goes through multiple or maybe an unintuitive order of limits and consumes tokens from them.

What we're seeing is very specific: which is the effective rate becomes much less than any one token bucket.

So you have a token bucket A of 600 rps and and B of 800 rps. What we're seeing is that a client sending 1000 rps gets rate limited to 200-300 rps.

bmcalary-atlassian commented 2 years ago

If the following config is used (no local_rate_limit_per_downstream_connection: true):

- 
    name: envoy.filters.http.local_ratelimit.a
    typed_config:
        '@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
        stat_prefix: a
        token_bucket:
            max_tokens: 600
            tokens_per_fill: 60
            fill_interval: 0.100s
        filter_enabled:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enabled_a
        filter_enforced:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enforced_a
- 
    name: envoy.filters.http.local_ratelimit.b
    typed_config:
        '@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
        stat_prefix: b
        token_bucket:
            max_tokens: 601
            tokens_per_fill: 61
            fill_interval: 0.100s
        filter_enabled:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enabled_b
        filter_enforced:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enforced_b
- 
    name: envoy.filters.http.local_ratelimit.c
    typed_config:
        '@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
        stat_prefix: c
        token_bucket:
            max_tokens: 602
            tokens_per_fill: 62
            fill_interval: 0.100s
        filter_enabled:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enabled_c
        filter_enforced:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enforced_c

The effective rate applied to a single client connection sending ~1000/s for 10s is:

"status_codes":{"200":6465,"429":3037},"errors":["429 Too Many Requests"]

a.http_local_rate_limit.enabled: 9504
a.http_local_rate_limit.enforced: 3037
a.http_local_rate_limit.ok: 6467
a.http_local_rate_limit.rate_limited: 3037
b.http_local_rate_limit.enabled: 6467
b.http_local_rate_limit.enforced: 0
b.http_local_rate_limit.ok: 6467
b.http_local_rate_limit.rate_limited: 0
c.http_local_rate_limit.enabled: 6467
c.http_local_rate_limit.enforced: 0
c.http_local_rate_limit.ok: 6467
c.http_local_rate_limit.rate_limited: 0

Which is expected.

If local_rate_limit_per_downstream_connection: true is added, the expectation is that (in this 1 client, 1 connection situation) the allowed requests should be roughly similar to above test, around ~6400-6600. However, we see ~3x as many tokens are taken by each request from bucket a than expected.

"status_codes":{"200":2160,"429":7791},"errors":["429 Too Many Requests"]

a.http_local_rate_limit.enabled: 9953
a.http_local_rate_limit.enforced: 7791
a.http_local_rate_limit.ok: 2162
a.http_local_rate_limit.rate_limited: 7791
b.http_local_rate_limit.enabled: 2162
b.http_local_rate_limit.enforced: 0
b.http_local_rate_limit.ok: 2162
b.http_local_rate_limit.rate_limited: 0
c.http_local_rate_limit.enabled: 2162
c.http_local_rate_limit.enforced: 0
c.http_local_rate_limit.ok: 2162
c.http_local_rate_limit.rate_limited: 0

- 
    name: envoy.filters.http.local_ratelimit.per_conn_a
    typed_config:
        '@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
        stat_prefix: per_conn_a
        token_bucket:
            max_tokens: 600
            tokens_per_fill: 60
            fill_interval: 0.100s
        filter_enabled:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enabled_per_conn_a
        filter_enforced:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enforced_per_conn_a
        local_rate_limit_per_downstream_connection: true
- 
    name: envoy.filters.http.local_ratelimit.per_conn_b
    typed_config:
        '@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
        stat_prefix: per_conn_b
        token_bucket:
            max_tokens: 601
            tokens_per_fill: 61
            fill_interval: 0.100s
        filter_enabled:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enabled_per_conn_b
        filter_enforced:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enforced_per_conn_b
        local_rate_limit_per_downstream_connection: true
- 
    name: envoy.filters.http.local_ratelimit.per_conn_c
    typed_config:
        '@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
        stat_prefix: per_conn_c
        token_bucket:
            max_tokens: 602
            tokens_per_fill: 62
            fill_interval: 0.100s
        filter_enabled:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enabled_per_conn_c
        filter_enforced:
            default_value:
                numerator: 100
            runtime_key: local_rate_limit_enforced_per_conn_c
        local_rate_limit_per_downstream_connection: true

Every additional local_rate_limit_per_downstream_connection: true bucket the request encounters during its processing, takes that many tokens away from the very first bucket the request encounters.

Seems like a bug?

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

bmcalary-atlassian commented 2 years ago

not stale bug still exists.

kyessenov commented 2 years ago

Could this be a dupe of the issue in https://github.com/envoyproxy/envoy/pull/21327?

bmcalary-atlassian commented 2 years ago

@kyessenov I don't think it is a dupe of https://github.com/envoyproxy/envoy/pull/21327 as that relates to an issue in refill logic, whereas this bug seems to be a simple case of multiple successive local-rate-limits all taking their tokens from the first bucket encountered, instead of their own buckets.

nkindberg commented 2 years ago

I agree that it appears Envoy creates a single local rate limit token bucket that is shared across all requests on a connection when local_rate_limit_per_downstream_connection is enabled and applies that to all local rate limit decisions - instead of having a token bucket per config. This doesn't seem related to #21327.

I am running into the same issue with per-route local rate limit configurations. e.g. I setup 2 routes like this via RDS. I find that if the first request on the connection matches route A the route A token bucket will be used for requests to both A and B for the lifetime of the connection (limited to 5 RPS total across both), and conversely if the first request matches B the route B token bucket is used for requests to both A and B on that connection (limited to 1000 RPS total across both).

My expectation would be that each local rate limit config has a unique token bucket for the connection.

{
  "match": {
   "prefix": "/A/",
   "case_sensitive": false
  },
  "route": {
   "cluster": "A",
   "max_stream_duration": {
    "grpc_timeout_header_max": "0s"
   }
  },
  "typed_per_filter_config": {
   "envoy.filters.http.local_ratelimit": {
    "@type": "type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit",
    "stat_prefix": "A",
    "token_bucket": {
     "max_tokens": 5,
     "tokens_per_fill": 5,
     "fill_interval": "1s"
    },
    "filter_enabled": {
     "default_value": {
      "numerator": 100
     },
     "runtime_key": "A.local_rate_limit_enabled"
    },
    "filter_enforced": {
     "default_value": {
      "numerator": 100
     },
     "runtime_key": "A.local_rate_limit_enforced"
    },
    "local_rate_limit_per_downstream_connection": true
   }
  }
 },
 {
  "match": {
   "prefix": "/B/",
   "case_sensitive": false
  },
  "route": {
   "cluster": "B",
   "max_stream_duration": {
    "grpc_timeout_header_max": "0s"
   }
  },
  "typed_per_filter_config": {
   "envoy.filters.http.local_ratelimit": {
    "@type": "type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit",
    "stat_prefix": "B",
    "token_bucket": {
     "max_tokens": 1000,
     "tokens_per_fill": 1000,
     "fill_interval": "1s"
    },
    "filter_enabled": {
     "default_value": {
      "numerator": 100
     },
     "runtime_key": "B.local_rate_limit_enabled"
    },
    "filter_enforced": {
     "default_value": {
      "numerator": 100
     },
     "runtime_key": "B.local_rate_limit_enforced"
    },
    "local_rate_limit_per_downstream_connection": true
   }
  }
 }

bmcalary-atlassian commented 1 year ago

I've also found that if you have a local rate limit defined at the http_connection_manager level as one of the first (or the first) http_filters and the request, and the request will match a separate http_local_rate_limit at the vhost/route level, then the http_connection_manager level local_rate_limit is not hit, no matter whether the request rate is exceeding it's token bucket or not.

Should show enabled or ok hits here, or even rate_limited but shows all zeros:

mylistener.http_local_rate_limit.enabled: 0
mylistener.http_local_rate_limit.enforced: 0
mylistener.http_local_rate_limit.ok: 0
mylistener.http_local_rate_limit.rate_limited: 0

Each request shows as using two tokens from the vhost/route level http_local_rate_limit bucket.

localratelimit_vhost.http_local_rate_limit.enabled: 52
localratelimit_vhost.http_local_rate_limit.enforced: 22
localratelimit_vhost.http_local_rate_limit.ok: 30
localratelimit_vhost.http_local_rate_limit.rate_limited: 22

As soon as you remove the vhost/route level http_local_rate_limit, then all of a sudden the http_connection_manager.http_filters one starts getting action:

mylistener.http_local_rate_limit.enabled: 20
mylistener.http_local_rate_limit.enforced: 0
mylistener.http_local_rate_limit.ok: 20
mylistener.http_local_rate_limit.rate_limited: 0

This is particularly perplexing because I would expect that the http_connection_manager.http_filters local_rate_limit would be executed on before envoy.filters.http.router (further down in the filters) and (if sufficiently restrictive) end processing before reaching the vhost/routes evaluation. Currently even if the http_connection_manager.http_filters local_rate_limit is set to a very low value, one which would be exhausted immediately, as long as another local_rate_limit exists at the vhost/route level, it seems to take the token decrements upon itself anyways. Envoy seems to look ahead? Or maybe my mental model of Envoys http_filters as a series of sequential steps is misguided.

envoyproxy / envoy

Successive per_downstream_connection http local_ratelimit filters seem to reduce effective tokens, rate limit too early #22163