Open wwillsey opened 4 years ago
Hey @mattklein123, I've created this issue to follow up on https://github.com/envoyproxy/ratelimit/issues/167. Please let me know if you think any more details would be helpful.
Thanks!
Yeah this makes sense to me. Marking help wanted.
any news regarding this?
Pinterest is interested in this also. cc @fishcakez @JuniorHsu
Hi, any news regarding this? We would like to use it to limit the token/minutes for the LLM use case, as they are usually limited by tokens-per-minutes
rather than requests/secs.
Related work. I updated the ratelimit client to support the hits_addend field (https://github.com/envoyproxy/envoy/pull/28939). Some extra work would be required so users can configure ratelimit sidecar to send hits_addend
@lizzzcai In case you are using the OpenAI API, I think they limit on request token + response token. So further work would be required either in the ratelimit filter or another new filter so the response token can be sent to the ratelimit sidecar on the response flow.
Hi @PeterL328 , thanks for your update, I will follow your other PR for the progress.
In case you are using the OpenAI API, I think they limit on request token + response token.
For our case, we are using Azure OpenAI. However, I think the limit is not on the response
token at least for Azure OpenAI. For our case we are using prompt text token
+ max_tokens
(max number of token will be responded) in the request
.
Reference: Azure OpenAI
As each request is received, Azure OpenAI computes an estimated max processed-token count that includes the following:
Prompt text and count The max_tokens parameter setting The best_of parameter setting
As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets.
Hi @lizzzcai, We use OpenAI API and also Azure OpenAI. I believe both will report back the total token consumed (request + response token) in the response body.
Yea you can use the max token on the response but it will not be accurate if that is what you need if you plan to track it.
I have opened #34184 as a potential solution to setting hits_addend
in an unobtrusive way.
after #34184 merged, able to close this?
Hi @EItanya, I'm trying to use the hits addend with istio. Can you please provide me an example of how to configure this as an EnvoyFilter?
I was trying to use the set filter state filter to set the envoy.ratelimit.hits_addend
filter state from a request header, but It was not working.
I get the following error.
Error adding/updating listener(s) virtualInbound: 'envoy.ratelimit.hits_addend' does not have an object factory.
please use master branch
Hi @zirain , I managed to build and use the piot and proxyv2 images of istio from master branch.
I am tryting to create the EnvoyFilter objects. envoyfilter_hits_addend.txt
Is this the correct way to set the envoy.ratelimit.hits_addend filter state from a request header called hits, before the rate limit filter?
be careful of inserting a filter based on something that is created by another envoyfilter.
Seeing the same problem than the one described by @OS-ramamurtisubramanian on the latest v1.31.2
, when trying to use the envoy.filters.http.set_filter_state
filter to set the envoy.ratelimit.hits_addend
state key.
- name: envoy.filters.http.set_filter_state
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config
on_request_headers:
- object_key: envoy.ratelimit.hits_addend
format_string:
text_format_source:
inline_string: "0"
Error log is:
[main] [source/server/server.cc:412] error initializing config ' /etc/envoy/envoy.yaml': 'envoy.ratelimit.hits_addend' does not have an object factory
Is there another configuration to add ?
Seeing the same problem than the one described by @OS-ramamurtisubramanian on the latest
v1.31.2
, when trying to use theenvoy.filters.http.set_filter_state
filter to set theenvoy.ratelimit.hits_addend
state key.- name: envoy.filters.http.set_filter_state typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config on_request_headers: - object_key: envoy.ratelimit.hits_addend format_string: text_format_source: inline_string: "0"
Error log is:
[main] [source/server/server.cc:412] error initializing config ' /etc/envoy/envoy.yaml': 'envoy.ratelimit.hits_addend' does not have an object factory
Is there another configuration to add ?
I cannot recall, but can you give a try with main branch?
@zirain I just tried using a freshly built envoy binary from the main
branch.
> ./envoy --version
./envoy version: 51e253405a2be7f94df8c0ba78bd884dc79bb8a5/1.32.0-dev/Modified/DEBUG/BoringSSL
Configuration tested:
admin:
address:
socket_address: { address: 127.0.0.1, port_value: 9901 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 127.0.0.1, port_value: 10000 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: some_service }
http_filters:
- name: envoy.filters.http.set_filter_state
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config
on_request_headers:
- object_key: envoy.ratelimit.hits_addend
format_string:
text_format_source:
inline_string: "0"
- name: envoy.filters.http.ratelimit
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: rpm
enable_x_ratelimit_headers: DRAFT_VERSION_03
failure_mode_deny: false
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: ratelimit
transport_api_version: V3
rate_limited_as_resource_exhausted: true
request_type: external
stage: 0
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: some_service
connect_timeout: 0.25s
type: STATIC
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: some_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 1234
- name: ratelimit
connect_timeout: 1s
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: ratelimit
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 5001
same error:
[2024-10-09 18:14:52.204][619209][info][main] [source/server/server.cc:871] runtime: {}
[2024-10-09 18:14:52.206][619209][info][admin] [source/server/admin/admin.cc:65] admin address: 127.0.0.1:9901
[2024-10-09 18:14:52.209][619209][info][config] [source/server/configuration_impl.cc:168] loading tracing configuration
[2024-10-09 18:14:52.209][619209][info][config] [source/server/configuration_impl.cc:124] loading 0 static secret(s)
[2024-10-09 18:14:52.209][619209][info][config] [source/server/configuration_impl.cc:130] loading 2 cluster(s)
[2024-10-09 18:14:52.231][619209][info][config] [source/server/configuration_impl.cc:138] loading 1 listener(s)
[2024-10-09 18:14:52.241][619209][info][config] [source/server/configuration_impl.cc:168] loading tracing configuration
[2024-10-09 18:14:52.241][619209][info][config] [source/server/configuration_impl.cc:124] loading 0 static secret(s)
[2024-10-09 18:14:52.241][619209][info][config] [source/server/configuration_impl.cc:130] loading 2 cluster(s)
[2024-10-09 18:14:52.258][619209][info][config] [source/server/configuration_impl.cc:138] loading 1 listener(s)
[2024-10-09 18:14:52.266][619209][critical][main] [source/server/server.cc:412] error initializing config ' envoy-basic.yaml': 'envoy.ratelimit.hits_addend' does not have an object factory
[2024-10-09 18:14:52.268][619209][info][main] [source/server/server.cc:1042] exiting
'envoy.ratelimit.hits_addend' does not have an object factory
I'm not sure how you build it, I cannot reproduce it on my machine.
bazel build envoy
cp bazel-bin/source/exe/envoy-static /usr/local/bin/envoy-dev
envoy-dev -c envoy.yaml
@zirain , sorry, I must have missed something in my first build (I was using the docker script provided). Trying with your command indeed works. Thank you !
It's very useful to set a different hitsAddend
value per filter for different domains when multiple ratelimit filters are chained.
Description
The RLS v3 api describes the RateLimitService as able to injest a hits_addend field to determine number of tokens to use for the rate limiting request. Envoy should provide a method for extracting a value from a request header (or some other method) to populate this method on a per request basis. If hits_addend is only static, then it is effectively the same as modifying the ratelimit.
Use case
In the HTTP Rate Limit Filter allow for a configuration of a request header containing an integer hits_addend value to send with the rate limit request, allowing for greater configurability of rate limiting capabilities.