apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.32k stars 2.49k forks source link

request help: limit-req限速不准确 #7804

Closed wyfaq closed 1 year ago

wyfaq commented 2 years ago

Current Behavior

1.开启限速插件,限速为每秒2万。

[root@ansible-wy ~]# curl -X PATCH -H 'X-API-KEY: blapp9f034335f136f87ad84b625c8f1' http://192.168.200.187:8080/apisix/admin/routes/test-admin-svc-ab -d '{"plugins":{"limit-req":{"rate":20000,"burst":0,"rejected_code":505,"key_type":"var","key":"remote_addr","rejected_msg":"limit by blqqd!","allow_degradation":false,"nodelay":false}}}' {"node":{"value":{"upstream_id":"SIT-test-service","create_time":1661355948,"priority":0,"update_time":1661663104,"status":1,"methods":["PUT","GET","POST"],"hosts":["testadmservice.st.testidc.com"],"uri":"\/*","id":"test-admin-svc-ab","plugins":{"traffic-split":{"rules":[{"weighted_upstreams":[{"upstream_id":"00000000000000000068","weight":1}],"match":[{"vars":[["remote_addr","~~","10.201.24[0-9]+"]]}]}]},"kafka-logger":{"inactive_timeout":5,"cluster_name":1,"producer_batch_num":200,"include_resp_body":false,"producer_max_buffering":50000,"producer_type":"async","max_retry_count":0,"retry_delay":1,"buffer_duration":60,"include_req_body":false,"producer_time_linger":1,"batch_max_size":1,"kafka_topic":"test-admin-service-allroute","name":"kafka logger","broker_list":{"192.168.100.102":9092,"192.168.100.103":9092,"192.168.100.104":9092,"192.168.100.105":9092,"192.168.100.101":9092},"producer_batch_size":1048576,"required_acks":0,"timeout":3,"meta_format":"default"},"limit-req":{"rejected_msg":"limit by blqqd!","rate":20000,"burst":0,"key_type":"var","nodelay":false,"allow_degradation":false,"key":"remote_addr","rejected_code":505},"gzip":{"buffers":{"number":8,"size":4096},"comp_level":1,"types":["text\/html"],"http_version":1.1,"min_length":20}},"name":"test-admin-service-allroute"},"key":"\/apisix\/routes\/test-admin-svc-ab"},"action":"compareAndSwap"}

2、用ab工具测试,2w个请求,耗时2.5s,失败率接近90% [root@ansible-wy ~]# ab -n 20000 -c 100 -H HOST:testadmservice.st.testidc.com http://192.168.200.187/testAdmin/Env/ping.htm This is ApacheBench, Version 2.3 <$Revision: 1430300 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.200.187 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests Completed 18000 requests Completed 20000 requests Finished 20000 requests

Server Software: APISIX/2.15.0 Server Hostname: 192.168.200.187 Server Port: 80

Document Path: /testAdmin/Env/ping.htm Document Length: 46 bytes

Concurrency Level: 100 Time taken for tests: 2.537 seconds Complete requests: 20000 Failed requests: 17022 (Connect: 0, Receive: 0, Length: 17022, Exceptions: 0) Write errors: 0 Non-2xx responses: 17022 Total transferred: 4302876 bytes HTML transferred: 681692 bytes Requests per second: 7882.26 [#/sec] (mean) Time per request: 12.687 [ms] (mean) Time per request: 0.127 [ms] (mean, across all concurrent requests) Transfer rate: 1656.07 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 1 1.7 0 15 Processing: 0 12 19.7 2 311 Waiting: 0 12 19.6 2 311 Total: 0 13 19.6 3 311

Percentage of the requests served within a certain time (ms) 50% 3 66% 8 75% 16 80% 27 90% 45 95% 49 98% 53 99% 58 100% 311 (longest request)

3、取消限速,测试正常 [root@ansible-wy ~]# curl -X PUT -H 'X-API-KEY: blapp9f034335f136f87ad84b625c8f1' http://192.168.200.187:8080/apisix/admin/routes/test-admin-svc-ab -d '{"uri":"/","name":"test-admin-service-allroute","methods":["PUT","GET","POST"],"hosts":["testadmservice.st.testidc.com"],"plugins":{"gzip":{"buffers":{"number":8,"size":4096},"comp_level":1,"http_version":1.1,"min_length":20,"types":["text/html"]},"kafka-logger":{"batch_max_size":1,"broker_list":{"192.168.100.101":9092,"192.168.100.102":9092,"192.168.100.103":9092,"192.168.100.104":9092,"192.168.100.105":9092},"buffer_duration":60,"cluster_name":1,"inactive_timeout":5,"include_req_body":false,"include_resp_body":false,"kafka_topic":"test-admin-service-allroute","max_retry_count":0,"meta_format":"default","name":"kafka logger","producer_batch_num":200,"producer_batch_size":1048576,"producer_max_buffering":50000,"producer_time_linger":1,"producer_type":"async","required_acks":0,"retry_delay":1,"timeout":3},"traffic-split":{"rules":[{"match":[{"vars":[["remote_addr","~~","10.201.24[0-9]+"]]}],"weighted_upstreams":[{"upstream_id":"00000000000000000068","weight":1}]}]}},"upstream_id":"SIT-test-service","status":1}' {"node":{"value":{"upstream_id":"SIT-test-service","create_time":1661355948,"status":1,"update_time":1661663139,"priority":0,"methods":["PUT","GET","POST"],"hosts":["testadmservice.st.testidc.com"],"uri":"\/","id":"test-admin-svc-ab","plugins":{"traffic-split":{"rules":[{"weighted_upstreams":[{"upstream_id":"00000000000000000068","weight":1}],"match":[{"vars":[["remote_addr","~~","10.201.24[0-9]+"]]}]}]},"kafka-logger":{"inactive_timeout":5,"cluster_name":1,"producer_batch_num":200,"include_resp_body":false,"producer_max_buffering":50000,"producer_type":"async","max_retry_count":0,"retry_delay":1,"buffer_duration":60,"include_req_body":false,"producer_time_linger":1,"batch_max_size":1,"meta_format":"default","name":"kafka logger","broker_list":{"192.168.100.102":9092,"192.168.100.103":9092,"192.168.100.104":9092,"192.168.100.105":9092,"192.168.100.101":9092},"kafka_topic":"test-admin-service-allroute","required_acks":0,"timeout":3,"producer_batch_size":1048576},"gzip":{"buffers":{"number":8,"size":4096},"comp_level":1,"types":["text\/html"],"http_version":1.1,"min_length":20}},"name":"test-admin-service-allroute"},"key":"\/apisix\/routes\/test-admin-svc-ab"},"action":"set"} [root@ansible-wy ~]# ab -n 20000 -c 100 -H HOST:testadmservice.st.testidc.com http://192.168.200.187/testAdmin/Env/ping.htm This is ApacheBench, Version 2.3 <$Revision: 1430300 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 192.168.200.187 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests Completed 18000 requests Completed 20000 requests Finished 20000 requests

Server Software: APISIX/2.15.0 Server Hostname: 192.168.200.187 Server Port: 80

Document Path: /testAdmin/Env/ping.htm Document Length: 46 bytes

Concurrency Level: 100 Time taken for tests: 4.864 seconds Complete requests: 20000 Failed requests: 0 Write errors: 0 Total transferred: 6720000 bytes HTML transferred: 920000 bytes Requests per second: 4111.87 [#/sec] (mean) Time per request: 24.320 [ms] (mean) Time per request: 0.243 [ms] (mean, across all concurrent requests) Transfer rate: 1349.21 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.8 0 11 Processing: 3 24 49.0 7 800 Waiting: 3 23 47.5 7 800 Total: 3 24 49.1 8 800

Percentage of the requests served within a certain time (ms) 50% 8 66% 14 75% 21 80% 28 90% 57 95% 97 98% 171 99% 241 100% 800 (longest request)

Expected Behavior

按照每秒2w的限速,不应该有失败的请求。

Error Logs

No response

Steps to Reproduce

1、配置限流 curl -X PATCH -H 'X-API-KEY: blapp9f034335f136f87ad84b625c8f1' http://192.168.200.187:8080/apisix/admin/routes/test-admin-svc-ab -d '{"plugins":{"limit-req":{"rate":20000,"burst":0,"rejected_code":505,"key_type":"var","key":"remote_addr","rejected_msg":"limit by blqqd!","allow_degradation":false,"nodelay":false}}}'

2、ab压测,失败率高 3、取消限流 4、ab压测,无失败

Environment

soulbird commented 2 years ago

Is there a relevant error log?

monkeyDluffy6017 commented 2 years ago

The reason is that the requests are too fast, and nginx's time precision is too low. The formula is here: https://github.com/openresty/lua-resty-limit-traffic/blob/fcce9ca9ee125c02e79acac186a4647e7ee5bafd/lib/resty/limit/req.lua#L95。 With burst = 0, when two requests are too close in time, the limit-req won't work You should set a larger burst or change an algorithm.

tzssangglass commented 2 years ago

limit-req use Leaky Bucket to limit traffic, this phenomenon is the defect of Leaky Bucket algorithm.

You can learn more about Leaky Bucket algorithm here: https://www.upyun.com/opentalk/417.html

kingluo commented 2 years ago

@tzssangglass If the rate is configurated as 20000 reqs/sec, and the real rate is not faster than this rate, then the test should not fail. I don't think it's a defect of leaky bucket algorithm. It's a bug of the implementation:

The excess should consider rate first and then burst.

--- /usr/local/openresty-debug/lualib/resty/limit/req.lua.bak   2022-07-20 00:04:11.873302849 +0800
+++ /usr/local/openresty-debug/lualib/resty/limit/req.lua       2022-09-01 15:36:52.228528078 +0800
@@ -97,7 +97,7 @@

         -- print("excess: ", excess)

-        if excess > self.burst then
+        if excess > self.rate and excess > self.burst then
             return nil, "rejected"
         end

With this bugfix, it works even if burst is 0. In fact, it is supposed to be an optional configuration value.

I agree with @monkeyDluffy6017. It happens when the duration between two consecutive requests is small or even 0 (rememeber that the time resolution is in milliseconds.), where the calculated rate is higher than the real one.

@wyfaq IMO, I think you should configure burst to be equal to rate at least.

BTW, in Chinese doc, the descriptioin about burst is wrong, the burst is not an incremental value but an absolute value: https://apisix.apache.org/zh/docs/apisix/plugins/limit-req/.

tzssangglass commented 2 years ago

Does this change make limit-req lose its traffic shaping function?

kingluo commented 2 years ago

Does this change make limit-req lose its traffic shaping function?

No, it's in fact a bugfix. If the real rate is within the rate range, it would delay the requests to archieve the rate limit. If it execeeds the burst, the requests would be reject.

tzssangglass commented 2 years ago

No, it's in fact a bugfix. If the real rate is within the rate range, it would delay the requests to archieve the rate limit. If it execeeds the burst, the requests would be reject.

I get