apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.58k stars 2.52k forks source link

request help: my upstream is not evenly loaded #5330

Closed sandy420 closed 3 years ago

sandy420 commented 3 years ago

Issue description

My upstream's type is RoundRobin,priority is 10,and my config.yaml See below:

apisix:
  stream_proxy:
      only: false
  node_listen: 80
  allow_admin:
    - 127.0.0.1
  router:
    http: 'radixtree_host_uri'
  ssl:
    listen_port: 443
    enable_http2: false
    ssl_protocols: "TLSv1 TLSv1.1 TLSv1.2 TLSv1.3"
    ssl_ciphers: "ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kED
H+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA"plugins:
  - proxy-rewrite
  - prometheus
  - log-rotate
  - redirect
  - node-status
  - server-info
  - proxy-mirror
  - echo
  - ip-restriction
  - basic-auth
  - kafka-logger
plugin_attr:
  prometheus:
    export_uri: /apisix/prometheus/metrics
    enable_export_server: true
    export_addr:
      ip: 0.0.0.0
      port: 11091
  log-rotate:
    interval: 21600
    max_kept: 120
nginx_config:
  http_end_configuration_snippet: |
    reset_timedout_connection off;
        client_header_buffer_size 128k;
        client_body_buffer_size 2048k;
        proxy_intercept_errors on;
        large_client_header_buffers 8 32k;
        gzip off;
        gzip_min_length 1k;
        gzip_buffers 4 16k;
        gzip_http_version 1.0;
        gzip_comp_level 9;
        gzip_vary on;
        gzip_types
        text/plain
        application/x-javascript
        text/css
        application/javascript
        application/json
        application/xml
        text/javascript
        application/x-httpd-php
        image/jpeg
        image/gif
        image/png;
        proxy_next_upstream off;
        proxy_connect_timeout 60;
        proxy_send_timeout 300s;
        proxy_read_timeout 300s;
        proxy_buffers 64 32k; # getconf PAGESIZE
        proxy_buffer_size 1024k;
        proxy_busy_buffers_size 1024k;
        proxy_temp_file_write_size 1024k;
        proxy_max_temp_file_size 0;
        proxy_ignore_client_abort on;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header traffic-source "INTERNAL";
  http:
    lua_shared_dict:
      prometheus-metrics: 100m
    real_ip_header: "X-Forwarded-For"
    real_ip_from:
      - 0.0.0.0/0
      - 'unix:'
    keepalive_timeout: 600s
    client_header_timeout: 600s
    client_body_timeout: 600s
    send_timeout: 100s
    access_log_format: "$time_iso8601|$msec|$status|$request_completion|$bytes_sent|$body_bytes_sent|$realip_remote_addr|$remote_addr|$http_x_forwarded_for|$remote_user|$host|$server_name|$
server_port|$server_protocol|$scheme|$request_method|$request_length|$request_time|$request_uri|$uri|$content_length|$content_type|$http_referer|$http_user_agent|$http_app_jb|$http_client_info|$upstream_addr|$upstream_connect_time|$upstream_header_time|$upstream_response_time|$upstream_status|$upstream_bytes_received|$upstream_cache_status|$upstream_http_content_type|$upstream_http_content_length|$upstream_http_content_disposition|$http_x_cat_parent_id|$upstream_scheme://$upstream_host$upstream_uri"`
-----------------------------------------
upstream's config :

`{
    "hash_on": "vars",
    "pass_host": "pass",
    "nodes": [
        {
            "host": "10.1.1.21",
            "port": 11180,
            "weight": 100,
            "priority": 10
        },
        {
            "host": "10.1.1.213",
            "port": 11180,
            "weight": 100,
            "priority": 10
        },
        {
            "host": "10.1.1.214",
            "port": 11180,
            "weight": 100,
            "priority": 10
        },
        {
            "host": "10.1.1.215",
            "port": 11180,
            "weight": 100,
            "priority": 10
        },
        {
            "host": "10.1.1.216",
            "port": 11180,
            "weight": 100,
            "priority": 10
        },
        {
            "host": "10.1.1.217",
            "port": 11180,
            "weight": 100,
            "priority": 10
        },
        {
            "host": "10.1.1.218",
            "port": 11180,
            "weight": 100,
            "priority": 10
        },
        {
            "host": "10.1.1.219",
            "port": 11180,
            "weight": 100,
            "priority": 10
        }
    ],
    "type": "roundrobin",
    "labels": {
        "type": "normal",
        "amh_env": "finack"
    },
    "checks": {
        "active": {
            "https_verify_certificate": true,
            "timeout": 2.5,
            "healthy": {
                "interval": 2,
                "successes": 3,
                "http_statuses": [
                    200
                ]
            },
            "type": "http",
            "unhealthy": {
                "interval": 2,
                "http_failures": 3,
                "tcp_failures": 3,
                "timeouts": 3,
                "http_statuses": [
                    429,
                    404,
                    500,
                    501,
                    502,
                    503,
                    504,
                    505
                ]
            },
            "concurrency": 10,
            "http_path": "/gateway/healthCheck"
        }
    },
    "name": "api-gateway",
    "timeout": {
        "read": 600,
        "send": 600,
        "connect": 600
    },
    "scheme": "http",
}

The back-end node is Eureka gateway. During peak hours, some back-end nodes reach 400 million QPS, some 200 million QPS and some 300 million QPS, which is very unbalanced. All my requests are short connections. Has anyone encountered them? Please let us know the solution, thank you!!!

Environment

shuaijinchao commented 3 years ago

Is there any basic monitoring data, such as the number of TCP connections and IO throughput on each node? In addition, is the Kernel configuration on each node consistent?

What data is your analysis based on in the question? Is the log of upstream eureka? Have you analyzed the access.log data of apisix?

tzssangglass commented 3 years ago

I see you have health_check configured, just a reminder that health_check affects roundrobin. Can you be sure that the back-end node is always healthy?

sandy420 commented 3 years ago

I see you have health_check configured, just a reminder that health_check affects roundrobin. Can you be sure that the back-end node is always healthy?

all back-end nodes are normal

sandy420 commented 3 years ago

Is there any basic monitoring data, such as the number of TCP connections and IO throughput on each node? In addition, is the Kernel configuration on each node consistent? A: all back-end node Kernel configure are consistent. What data is your analysis based on in the question? Is the log of upstream eureka? Have you analyzed the access.log data of apisix? A: The apisix cluster configuration is consistent with the kernel parameters. The following chart is based on apisix log analysis. In the chart, the request for each apisix assigned to the back-end node is very different.

lQLPDhrPtdtBfQ_NAoTNBPuwC9JmL5QR_XABgY40VkAIAA_1275_644

shuaijinchao commented 3 years ago

Is there a personalized strategy for different requests? Is it possible to filter a specific API log analysis request and whether the load is balanced?