apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.55k stars 2.52k forks source link

help request: apisix connection to etcd timeout #11665

Open GhangZh opened 1 month ago

GhangZh commented 1 month ago

Description

I've been having some problems with apisix lately:

  1. apisix connection to etcd timeout, resulting in apisix-ingress-controller sync cluster changes to apisix timeout keep retrying, and then apisix cpu soaring high!
    2024/10/16 11:26:11 [warn] 72#72: *509452782 stream [lua] health_check.lua:90: report_failure(): update endpoint: http://xxxxxx:2379/ to unhealthy, context: ngx.timer
    2024/10/16 11:26:11 [warn] 68#68: *509030123 [lua] health_check.lua:90: report_failure(): update endpoint: http://xxxxxx:2379/ to unhealthy, context: ngx.timer
    2024/10/16 11:26:11 [warn] 97#97: *509293272 stream [lua] v3.lua:647: request_chunk(): http://xxxxx:2379/: connection timed out. Retrying, context: ngx.timer

apisix configmap

    nginx_config:                     # config for render the template to genarate nginx.conf
      error_log: "/dev/stderr"
      error_log_level: "warn"         # warn,error
      worker_rlimit_nofile: 60000     # the number of files a worker process can open, should be larger than worker_connections
      event:
        worker_connections: 60000

      http:
        lua_shared_dict:
          prometheus-metrics: 512m
        enable_access_log: true
        keepalive_timeout: 90s         # timeout during which a keep-alive client connection will stay open on the server side.
        client_header_timeout: 600s     # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client
        client_body_timeout: 600s       # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client
        send_timeout: 600s              # timeout for transmitting a response to the client.then the connection is closed
        client_max_body_size: 30720m
        underscores_in_headers: "on"   # default enables the use of underscores in client request header fields
        real_ip_header: "X-Real-IP"    # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
        real_ip_from:                  # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
          - 127.0.0.1
          - 'unix:'
      http_configuration_snippet:      |
        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        client_header_buffer_size 16m;
        large_client_header_buffers 4 16m;
        client_body_buffer_size 64m;
        proxy_buffering off;
        proxy_buffers 4 10m;
        proxy_buffer_size 10m;
        proxy_busy_buffers_size 10M;
        proxy_max_temp_file_size 0;
        proxy_connect_timeout 30s;
        proxy_send_timeout   600s;
        proxy_read_timeout   600s;
        proxy_cache off;
        proxy_request_buffering off;
      http_server_configuration_snippet:      |
        set $router_name -;
        set $upstream_name -;
        proxy_ignore_client_abort on;

    etcd:
      host:                                 # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
        - "http://xxxxx:2379"             # multiple etcd address
        - "http://xxxxx:2379"             # multiple etcd address
        - "http://xxxxx:2379"             # multiple etcd address
      prefix: "/apisix"     # apisix configurations prefix
      timeout: 30   # 30 seconds

Environment

moonming commented 1 month ago

You can try the latest version of Apache APISIX and ingress controller to see if this problem still exists? 2.13 is a very early version, and some issues may have been fixed in the latest version.

GhangZh commented 3 weeks ago

You can try the latest version of Apache APISIX and ingress controller to see if this problem still exists? 2.13 is a very early version, and some issues may have been fixed in the latest version.

Our production environment uses this version can not take the new version of the test, if it is a known problem if you can provide the relevant issue or pr?