envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.03k stars 4.82k forks source link

Envoy to pass hits_addend to RateLimitService #12969

Open wwillsey opened 4 years ago

wwillsey commented 4 years ago

Description

The RLS v3 api describes the RateLimitService as able to injest a hits_addend field to determine number of tokens to use for the rate limiting request. Envoy should provide a method for extracting a value from a request header (or some other method) to populate this method on a per request basis. If hits_addend is only static, then it is effectively the same as modifying the ratelimit.

Use case

In the HTTP Rate Limit Filter allow for a configuration of a request header containing an integer hits_addend value to send with the rate limit request, allowing for greater configurability of rate limiting capabilities.

wwillsey commented 4 years ago

Hey @mattklein123, I've created this issue to follow up on https://github.com/envoyproxy/ratelimit/issues/167. Please let me know if you think any more details would be helpful.

Thanks!

mattklein123 commented 4 years ago

Yeah this makes sense to me. Marking help wanted.

medalliaerlich commented 3 years ago

any news regarding this?

sc0ttbeardsley commented 2 years ago

Pinterest is interested in this also. cc @fishcakez @JuniorHsu

lizzzcai commented 1 year ago

Hi, any news regarding this? We would like to use it to limit the token/minutes for the LLM use case, as they are usually limited by tokens-per-minutes rather than requests/secs.

PeterL328 commented 1 year ago

Related work. I updated the ratelimit client to support the hits_addend field (https://github.com/envoyproxy/envoy/pull/28939). Some extra work would be required so users can configure ratelimit sidecar to send hits_addend

PeterL328 commented 1 year ago

@lizzzcai In case you are using the OpenAI API, I think they limit on request token + response token. So further work would be required either in the ratelimit filter or another new filter so the response token can be sent to the ratelimit sidecar on the response flow.

lizzzcai commented 1 year ago

Hi @PeterL328 , thanks for your update, I will follow your other PR for the progress.

In case you are using the OpenAI API, I think they limit on request token + response token.

For our case, we are using Azure OpenAI. However, I think the limit is not on the response token at least for Azure OpenAI. For our case we are using prompt text token + max_tokens(max number of token will be responded) in the request.

Reference: Azure OpenAI

As each request is received, Azure OpenAI computes an estimated max processed-token count that includes the following:

Prompt text and count The max_tokens parameter setting The best_of parameter setting

As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets.

PeterL328 commented 1 year ago

Hi @lizzzcai, We use OpenAI API and also Azure OpenAI. I believe both will report back the total token consumed (request + response token) in the response body.

Yea you can use the max token on the response but it will not be accurate if that is what you need if you plan to track it.

EItanya commented 6 months ago

I have opened #34184 as a potential solution to setting hits_addend in an unobtrusive way.

zirain commented 3 months ago

after #34184 merged, able to close this?

OS-ramamurtisubramanian commented 3 months ago

Hi @EItanya, I'm trying to use the hits addend with istio. Can you please provide me an example of how to configure this as an EnvoyFilter?

I was trying to use the set filter state filter to set the envoy.ratelimit.hits_addend filter state from a request header, but It was not working.

I get the following error.

Error adding/updating listener(s) virtualInbound: 'envoy.ratelimit.hits_addend' does not have an object factory.

zirain commented 3 months ago

please use master branch

OS-ramamurtisubramanian commented 3 months ago

Hi @zirain , I managed to build and use the piot and proxyv2 images of istio from master branch.

I am tryting to create the EnvoyFilter objects. envoyfilter_hits_addend.txt

Is this the correct way to set the envoy.ratelimit.hits_addend filter state from a request header called hits, before the rate limit filter?

zirain commented 3 months ago

be careful of inserting a filter based on something that is created by another envoyfilter.

gcalmettes commented 1 month ago

Seeing the same problem than the one described by @OS-ramamurtisubramanian on the latest v1.31.2, when trying to use the envoy.filters.http.set_filter_state filter to set the envoy.ratelimit.hits_addend state key.

          - name: envoy.filters.http.set_filter_state
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config
              on_request_headers:
              - object_key: envoy.ratelimit.hits_addend
                format_string:
                  text_format_source:
                    inline_string: "0"

Error log is:

[main] [source/server/server.cc:412] error initializing config '  /etc/envoy/envoy.yaml': 'envoy.ratelimit.hits_addend' does not have an object factory 

Is there another configuration to add ?

zirain commented 1 month ago

Seeing the same problem than the one described by @OS-ramamurtisubramanian on the latest v1.31.2, when trying to use the envoy.filters.http.set_filter_state filter to set the envoy.ratelimit.hits_addend state key.

          - name: envoy.filters.http.set_filter_state
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config
              on_request_headers:
              - object_key: envoy.ratelimit.hits_addend
                format_string:
                  text_format_source:
                    inline_string: "0"

Error log is:

[main] [source/server/server.cc:412] error initializing config '  /etc/envoy/envoy.yaml': 'envoy.ratelimit.hits_addend' does not have an object factory 

Is there another configuration to add ?

I cannot recall, but can you give a try with main branch?

gcalmettes commented 1 month ago

@zirain I just tried using a freshly built envoy binary from the main branch.

> ./envoy --version          

./envoy  version: 51e253405a2be7f94df8c0ba78bd884dc79bb8a5/1.32.0-dev/Modified/DEBUG/BoringSSL

Configuration tested:

admin:
  address:
    socket_address: { address: 127.0.0.1, port_value: 9901 }

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 127.0.0.1, port_value: 10000 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                route: { cluster: some_service }
          http_filters:
          - name: envoy.filters.http.set_filter_state
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config
              on_request_headers:
              - object_key: envoy.ratelimit.hits_addend
                format_string:
                  text_format_source:
                    inline_string: "0"
          - name: envoy.filters.http.ratelimit
            typed_config:
              '@type': type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
              domain: rpm
              enable_x_ratelimit_headers: DRAFT_VERSION_03
              failure_mode_deny: false
              rate_limit_service:
                grpc_service:
                  envoy_grpc:
                    cluster_name: ratelimit
                transport_api_version: V3
              rate_limited_as_resource_exhausted: true
              request_type: external
              stage: 0
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
  - name: some_service
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: some_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 1234
  - name: ratelimit
    connect_timeout: 1s
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: ratelimit
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 5001

same error:

[2024-10-09 18:14:52.204][619209][info][main] [source/server/server.cc:871] runtime: {}
[2024-10-09 18:14:52.206][619209][info][admin] [source/server/admin/admin.cc:65] admin address: 127.0.0.1:9901
[2024-10-09 18:14:52.209][619209][info][config] [source/server/configuration_impl.cc:168] loading tracing configuration
[2024-10-09 18:14:52.209][619209][info][config] [source/server/configuration_impl.cc:124] loading 0 static secret(s)
[2024-10-09 18:14:52.209][619209][info][config] [source/server/configuration_impl.cc:130] loading 2 cluster(s)
[2024-10-09 18:14:52.231][619209][info][config] [source/server/configuration_impl.cc:138] loading 1 listener(s)
[2024-10-09 18:14:52.241][619209][info][config] [source/server/configuration_impl.cc:168] loading tracing configuration
[2024-10-09 18:14:52.241][619209][info][config] [source/server/configuration_impl.cc:124] loading 0 static secret(s)
[2024-10-09 18:14:52.241][619209][info][config] [source/server/configuration_impl.cc:130] loading 2 cluster(s)
[2024-10-09 18:14:52.258][619209][info][config] [source/server/configuration_impl.cc:138] loading 1 listener(s)
[2024-10-09 18:14:52.266][619209][critical][main] [source/server/server.cc:412] error initializing config '  envoy-basic.yaml': 'envoy.ratelimit.hits_addend' does not have an object factory
[2024-10-09 18:14:52.268][619209][info][main] [source/server/server.cc:1042] exiting
'envoy.ratelimit.hits_addend' does not have an object factory
zirain commented 1 month ago

I'm not sure how you build it, I cannot reproduce it on my machine.

bazel build envoy
cp bazel-bin/source/exe/envoy-static /usr/local/bin/envoy-dev
envoy-dev -c envoy.yaml
gcalmettes commented 1 month ago

@zirain , sorry, I must have missed something in my first build (I was using the docker script provided). Trying with your command indeed works. Thank you ! It's very useful to set a different hitsAddend value per filter for different domains when multiple ratelimit filters are chained.