help request: Server etcd abnormally causes traffic nginx to wait for a sudden increase

swtseaman commented 1 year ago

Description

Apisix has been running for a long time and is basically stable, and etcd + apisix + java applications are deployed on all 3 nodes.

Recently, there has been a sudden increase in the number of NG waiting for two consecutive days. At the time of the situation, Etcd reachable is abnormal, the memory is full, and I don't know how to check it. Please give me some guidance;

The screenshot of grafana is as follows:

At the same time, there are a lot of timeouts in the log.

Environment

APISIX version (run apisix version): 3.2.0
Operating system (run uname -a): Linux localhost 3.10.0-1160.80.1.el7.x86_64 https://github.com/apache/apisix/pull/1 SMP Tue Nov 8 15:48:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
OpenResty / Nginx version (run openresty -V or nginx -V):nginx version: openresty/1.21.4.1
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info): {"version":"3.2.0","boot_time":1682039497,"etcd_version":"3.5.0","hostname":"localhost","id":"a74d7cad-b87e-49ae-82c7-1d35235f5b83"}

shreemaan-abhishek commented 1 year ago

cc: @kingluo

kingluo commented 1 year ago

@swtseaman Please show the config.yaml and more error logs. "The memory is full" you refer to means memory used by apisix increases a lot?

Many connections timeous is not surprise in 3.2.0, if possible, maybe upgrade to the most recent version is better.

swtseaman commented 1 year ago

I don't know what logs I can provide, only /usr/local/apisix/logs/error.log

The following is my config.yaml information

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# If you want to set the specified configuration value, you can set the new
# in this file. For example if you want to specify the etcd address:
#
# deployment:
#   role: traditional
#   role_traditional:
#     config_provider: etcd
#   etcd:
#     host:
#       - http://127.0.0.1:2379
#
# To configure via environment variables, you can use `${{VAR}}` syntax. For instance:
#
# deployment:
#   role: traditional
#   role_traditional:
#     config_provider: etcd
#   etcd:
#     host:
#       - http://${{ETCD_HOST}}:2379
#
# And then run `export ETCD_HOST=$your_host` before `make init`.
#
# If the configured environment variable can't be found, an error will be thrown.
#
# Also, If you want to use default value when the environment variable not set,
# Use `${{VAR:=default_value}}` instead. For instance:
#
# deployment:
#   role: traditional
#   role_traditional:
#     config_provider: etcd
#   etcd:
#     host:
#       - http://${{ETCD_HOST:=localhost}}:2379
#
# This will find environment variable `ETCD_HOST` first, and if it's not exist it will use `localhost` as default value.
#

# apisix
apisix:
  node_listen: 8000

plugins:                          # plugin list (sorted by priority)
  - real-ip                        # priority: 23000
  - ai                             # priority: 22900
  - client-control                 # priority: 22000
  - proxy-control                  # priority: 21990
  - request-id                     # priority: 12015
  - zipkin                         # priority: 12011
  #- skywalking                    # priority: 12010
  #- opentelemetry                 # priority: 12009
  - ext-plugin-pre-req             # priority: 12000
  - fault-injection                # priority: 11000
  - mocking                        # priority: 10900
  - serverless-pre-function        # priority: 10000
  #- batch-requests                # priority: 4010
  - cors                           # priority: 4000
  - ip-restriction                 # priority: 3000
  - ua-restriction                 # priority: 2999
  - referer-restriction            # priority: 2990
  - csrf                           # priority: 2980
  - uri-blocker                    # priority: 2900
  - request-validation             # priority: 2800
  - openid-connect                 # priority: 2599
  - cas-auth                       # priority: 2597
  - authz-casbin                   # priority: 2560
  - authz-casdoor                  # priority: 2559
  - wolf-rbac                      # priority: 2555
  - ldap-auth                      # priority: 2540
  - hmac-auth                      # priority: 2530
  - basic-auth                     # priority: 2520
  - jwt-auth                       # priority: 2510
  - key-auth                       # priority: 2500
  - consumer-restriction           # priority: 2400
  - forward-auth                   # priority: 2002
  - opa                            # priority: 2001
  - authz-keycloak                 # priority: 2000
  #- error-log-logger              # priority: 1091
  - body-transformer               # priority: 1080
  - proxy-mirror                   # priority: 1010
  - proxy-cache                    # priority: 1009
  - proxy-rewrite                  # priority: 1008
  - workflow                       # priority: 1006
  - api-breaker                    # priority: 1005
  - limit-conn                     # priority: 1003
  - limit-count                    # priority: 1002
  - limit-req                      # priority: 1001
  #- node-status                   # priority: 1000
  - gzip                           # priority: 995
  - server-info                    # priority: 990
  - traffic-split                  # priority: 966
  - redirect                       # priority: 900
  - response-rewrite               # priority: 899
  - degraphql                      # priority: 509
  - kafka-proxy                    # priority: 508
  #- dubbo-proxy                   # priority: 507
  - grpc-transcode                 # priority: 506
  - grpc-web                       # priority: 505
  - public-api                     # priority: 501
  - prometheus                     # priority: 500
  - datadog                        # priority: 495
  - elasticsearch-logger           # priority: 413
  - echo                           # priority: 412
  - loggly                         # priority: 411
  - http-logger                    # priority: 410
  - splunk-hec-logging             # priority: 409
  - skywalking-logger              # priority: 408
  - google-cloud-logging           # priority: 407
  - sls-logger                     # priority: 406
  - tcp-logger                     # priority: 405
  - kafka-logger                   # priority: 403
  - rocketmq-logger                # priority: 402
  - syslog                         # priority: 401
  - udp-logger                     # priority: 400
  - file-logger                    # priority: 399
  - clickhouse-logger              # priority: 398
  - tencent-cloud-cls              # priority: 397
  - inspect                        # priority: 200
  #- log-rotate                    # priority: 100
  # <- recommend to use priority (0, 100) for your custom plugins
  - example-plugin                 # priority: 0
  - xm-http-check                  # priority: 1
  #- gm                            # priority: -43
  - aws-lambda                     # priority: -1899
  - azure-functions                # priority: -1900
  - openwhisk                      # priority: -1901
  - openfunction                   # priority: -1902
  - serverless-post-function       # priority: -2000
  - ext-plugin-post-req            # priority: -3000
  - ext-plugin-post-resp           # priority: -4000

deployment:
  role: traditional
  role_traditional:
    config_provider: etcd
  admin:
    admin_key:
      - name: admin
        key:xxxxxxxxxxxxxxxxxxxxxxxxxxx
        role: admin
  etcd:
    host:
      - "http://192.168.198.174:2379"
      - "http://192.168.198.175:2379"
      - "http://192.168.198.176:2379"
    user: root
    password: xxxxxxxxxx

# Custom nginx configuration, which is used to proxy local static files
nginx_config:
  http_server_configuration_snippet: |
    location /tv/ {
              alias /data/order_portal/front/;
            }
            location /api/images {
              alias /data/order_portal/images/;
              autoindex on;
            }

plugin_attr:
  prometheus:
    export_addr:
      ip: 192.168.198.174

swtseaman commented 1 year ago

@swtseaman Please show the config.yaml and more error logs. "The memory is full" you refer to means memory used by apisix increases a lot?

Many connections timeous is not surprise in 3.2.0, if possible, maybe upgrade to the most recent version is better.

Many connections timeous Is it a bug of 3.2.0?

kingluo commented 1 year ago

I don't know what logs I can provide, only /usr/local/apisix/logs/error.log

Yes, show this file, if it's too large, show [error] and [warn] lines instead.

Many connections timeous Is it a bug of 3.2.0?

The latest version uses a new etcd watch, maybe you could have a try.

swtseaman commented 1 year ago

When the problem occurs, there is such a log.

2023/08/07 00:07:26 [warn] 38442#38442: *1150308426 [lua] v3.lua:245: _request_uri(): http://127.0.0.1:2379: timeout. Retrying, client: 172.16.29.1, server: , request: "GET /apisix/prometheus/metrics HTTP/1.1", host: "192.168.198.174:9091"

2023/08/07 00:07:26 [warn] 17942#17942: *1171382243 [lua] v3.lua:245: server_version(): http://127.0.0.1:2379: timeout. Retrying, client: 172.16.29.1, server: , request: "GET /apisix/prometheus/metrics HTTP/1.1", host: "192.168.198.175:9091"

I don't know what logs I can provide, only /usr/local/apisix/logs/error.log

Yes, show this file, if it's too large, show [error] and [warn] lines instead.

Many connections timeous Is it a bug of 3.2.0?

The latest version uses a new etcd watch, maybe you could have a try.

swtseaman commented 1 year ago

It just appeared again, and it's all the same upstream connection timed out.

swtseaman commented 1 year ago

If I want to upgrade, should I upgrade to 3.4.1 or 3.2.2 LTS?

github-actions[bot] commented 3 months ago

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@apisix.apache.org list. Thank you for your contributions.

github-actions[bot] commented 3 months ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

apache / apisix

help request: Server etcd abnormally causes traffic nginx to wait for a sudden increase #9982

Description

Environment