apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.45k stars 2.52k forks source link

apisix delay #5120

Closed zhixiang0608 closed 2 years ago

zhixiang0608 commented 3 years ago

Issue description

hello, Gateway delay p99 is much higher than other indicators image This is the result of my pressure test using wrk image My routing configuration is 2{7W( ~M4C27% JL`28@VLP

Environment

apisix vesion:apisix-2.6

tzssangglass commented 3 years ago

can you paste the route configuration of zyml-getuserinfo-api as text?

zhixiang0608 commented 3 years ago

{"id":"373703982817739720","create_time":1632274531,"update_time":1632274531,"uris":["/zymk-getuserinfo-api/"],"name":"zymk-getuserinfo-api","methods":["GET","POST","PUT","DELETE","PATCH","HEAD","OPTIONS","CONNECT","TRACE"],"plugins":{"prometheus":{"disable":false,"prefer_name":true},"proxy-rewrite":{"headers":{"X-Api-Engine":"apisix","X-Api-Version":"v1"},"regex_uri":["^/zymk-getuserinfo-api/(.)","/$1$2$3"]}},"upstream":{"nodes":[{"host":"172.21.24.51","port":7202,"weight":1}],"timeout":{"connect":6,"read":6,"send":6},"type":"roundrobin","scheme":"http","pass_host":"pass"},"status":1}

tokers commented 3 years ago

@zhixiang0608 Have you ever tried to benchmark the upstream directly? It's unknown whether the long tail requests are due to upstream or APISIX itself.

tzssangglass commented 3 years ago

My reproduction steps are as follows

  1. branch: release/2.6

    case 1.1: route configuration:

      {
          "uris": [
              "/zymk-getuserinfo-api/"
          ],
          "plugins": {
              "proxy-rewrite": {
                  "headers": {
                      "X-Api-Engine": "apisix",
                      "X-Api-Version": "v1"
                  },
                  "regex_uri": [
                      "^/zymk-getuserinfo-api/(.)",
                      "/$1$2$3"
                  ]
              }
          },
          "upstream": {
              "nodes": {
                  "127.0.0.1:1980": 1
              },
              "type": "roundrobin"
          }
      }

    benchmark:

      wrk2 -t4 -c60 -d10s -R4000 --latency http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
      Running 10s test @ http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
        4 threads and 60 connections
        Thread Stats   Avg      Stdev     Max   +/- Stdev
          Latency     1.20ms  755.26us  13.91ms   80.72%
          Req/Sec        nan       nan   0.00      0.00%
        Latency Distribution (HdrHistogram - Recorded Latency)
       50.000%    1.14ms
       75.000%    1.56ms
       90.000%    1.93ms
       99.000%    2.75ms
       99.900%   10.99ms
       99.990%   13.77ms
       99.999%   13.92ms
      100.000%   13.92ms

    case 1.2: route configuration:

      {
          "uris": [
              "/zymk-getuserinfo-api/"
          ],
          "upstream": {
              "nodes": {
                  "127.0.0.1:1980": 1
              },
              "type": "roundrobin"
          }
      }

    benchmark:

      wrk2 -t4 -c60 -d10s -R4000 --latency http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
      Running 10s test @ http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
        4 threads and 60 connections
        Thread Stats   Avg      Stdev     Max   +/- Stdev
          Latency     1.22ms  563.10us   9.66ms   67.77%
          Req/Sec        nan       nan   0.00      0.00%
        Latency Distribution (HdrHistogram - Recorded Latency)
       50.000%    1.19ms
       75.000%    1.59ms
       90.000%    1.93ms
       99.000%    2.62ms
       99.900%    3.32ms
       99.990%    8.47ms
       99.999%    9.67ms
      100.000%    9.67ms
  2. branch: master case 2.1: route configuration:

      {
          "uris": [
              "/zymk-getuserinfo-api/"
          ],
          "plugins": {
              "proxy-rewrite": {
                  "headers": {
                      "X-Api-Engine": "apisix",
                      "X-Api-Version": "v1"
                  },
                  "regex_uri": [
                      "^/zymk-getuserinfo-api/(.)",
                      "/$1$2$3"
                  ]
              }
          },
          "upstream": {
              "nodes": {
                  "127.0.0.1:1980": 1
              },
              "type": "roundrobin"
          }
      }

    benchmark:

      wrk2 -t4 -c60 -d10s -R4000 --latency http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
      Running 10s test @ http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
        4 threads and 60 connections
        Thread Stats   Avg      Stdev     Max   +/- Stdev
          Latency     1.21ms  663.60us  15.90ms   76.45%
          Req/Sec        nan       nan   0.00      0.00%
        Latency Distribution (HdrHistogram - Recorded Latency)
       50.000%    1.17ms
       75.000%    1.56ms
       90.000%    1.92ms
       99.000%    2.65ms
       99.900%    7.88ms
       99.990%   15.33ms
       99.999%   15.90ms
      100.000%   15.90ms

    case 2.2: route configuration:

      {
          "uris": [
              "/zymk-getuserinfo-api/"
          ],
          "upstream": {
              "nodes": {
                  "127.0.0.1:1980": 1
              },
              "type": "roundrobin"
          }
      }

    benchmark:

      wrk2 -t4 -c60 -d10s -R4000 --latency http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
      Running 10s test @ http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
        4 threads and 60 connections
        Thread Stats   Avg      Stdev     Max   +/- Stdev
          Latency     1.15ms  544.15us   3.64ms   66.16%
          Req/Sec        nan       nan   0.00      0.00%
        Latency Distribution (HdrHistogram - Recorded Latency)
       50.000%    1.14ms
       75.000%    1.52ms
       90.000%    1.86ms
       99.000%    2.57ms
       99.900%    3.08ms
       99.990%    3.44ms
       99.999%    3.64ms
      100.000%    3.64ms

from case 1.1 and case 1.2, it is related to the regular configuration of the proxy-rewrite plugin from case 1.2 and case 2.2, it is related to apisix version

but this does not explain the 59 ms delay.

Things you need to be sure of.

  1. whether /zymk-getuserinfo-api/ is the only route that has this problem, and if so, then it may be related to the upstream of this route.

  2. remove the prometheus and proxy-rewrite plugins from the /zymk-getuserinfo-api/ route separately to determine if it is an APISIX 2.6 issue or a plugin issue.

possibly relate PR: https://github.com/apache/apisix/pull/4405/files

tzssangglass commented 3 years ago

BTW, do not use wrk to do benchmark, see why:https://www.apiseven.com/zh/blog/analysis-of-inaccurate-distribution-of-wrk-latency, and disable the prometheus plugin, just use wrk2 to do:https://github.com/giltene/wrk2

macaty commented 3 years ago

use https://github.com/tsenart/vegeta

zhixiang0608 commented 3 years ago

@zhixiang0608 Have you ever tried to benchmark the upstream directly? It's unknown whether the long tail requests are due to upstream or APISIX itself. ther routing delays are normal, only this one,using this formula, I think it has nothing to do with service apisix latency = latency - upstream_reponse_time

tzssangglass commented 3 years ago

can you use openresty-xray to test this route? It's really weird, and it's really weird to remove prometheus and use wrk2.

zhixiang0608 commented 3 years ago

image Receive response body delay from upstream 😂

zhixiang0608 commented 3 years ago

image Receive response body delay from upstream 😂

zhixiang0608 commented 3 years ago

According to this formula, the time consumption of upstream responders shall be removed 9f60e2cb2e97dac0cbf32e0a5e1953d

tzssangglass commented 3 years ago

You are saying that some requests are skipped apisix_latency = apisix_latency - upstream_latency? Or maybe some requests don't have upstream_response_time

Let me think, what is the timeout set upstream?

I'm wondering if it's possible that the upstream response timeout is causing these requests to close.

github-actions[bot] commented 2 years ago

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@apisix.apache.org list. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.