apache / apisix-ingress-controller

APISIX Ingress Controller for Kubernetes
https://apisix.apache.org/
Apache License 2.0
972 stars 337 forks source link

bug: Enabling the APISIX ingress controller can lead to a surge in CPU usage on the APISIX gateway. #2028

Open maslow opened 8 months ago

maslow commented 8 months ago

Current Behavior

Enabling the APISIX ingress controller can lead to a surge in CPU usage on the APISIX gateway.

Turning off the APISIX ingress controller will resolve the issue.

image

The above-mentioned issue can be consistently and reliably reproduced in my cluster. Each time, the memory usage increases for three minutes, then the CPU usage returns to a lower level for three minutes, and after three minutes, it will rise again, continuing for three minutes. It's almost unusable and severely affects operations, with the service delay increasing from 20ms to 2000ms.

Expected Behavior

No response

Error Logs

2023-10-27T11:11:29+08:00 error ingress/ingress.go:148 failed to translate ingress {"error": "endpoints: endpoints \"j3t46i\" not found", "ingress": {}} 2023-10-27T11:11:29+08:00 warn ingress/ingress.go:268 sync ingress failed, will retry {"object": {"Type":1,"Object":{"Key":"j3t46i/j3t46i","GroupVersion":"networking/v1","OldObject":null},"OldObject":null,"Tombstone":null}, "error": "endpoints: endpoints \"j3t46i\" not found"} 2023-10-27T11:11:29+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:29+08:00 error translation/translator.go:158 failed to translate ingress backend to upstream {"error": "endpoints: endpoints \"dz24qv\" not found", "ingress": "&Ingress{ObjectMeta:{dz24qv dz24qv 37023cf1-b564-46cc-8779-77ed62cf901b 295270932 1 2023-10-19 20:31:45 +0800 HKT map[] map[k8s.apisix.apache.org/cors-allow-credential:false k8s.apisix.apache.org/cors-allow-headers: k8s.apisix.apache.org/cors-allow-methods: k8s.apisix.apache.org/cors-allow-origin: k8s.apisix.apache.org/cors-expose-headers: k8s.apisix.apache.org/enable-cors:true k8s.apisix.apache.org/enable-websocket:true k8s.apisix.apache.org/svc-namespace:dz24qv laf.dev/appid:dz24qv laf.dev/ingress.type:runtime nginx.ingress.kubernetes.io/cors-allow-credentials:false nginx.ingress.kubernetes.io/cors-allow-headers:DNT,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,x-laf-develop-token,x-laf-func-data nginx.ingress.kubernetes.io/cors-allow-methods: nginx.ingress.kubernetes.io/cors-allow-origin: nginx.ingress.kubernetes.io/cors-expose-headers: nginx.ingress.kubernetes.io/enable-cors:true nginx.ingress.kubernetes.io/server-snippet:client_header_buffer_size 4096k;\nlarge_client_header_buffers 8 512k;\n] [] [] [{unknown Update networking.k8s.io/v1 2023-10-19 20:31:45 +0800 HKT FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:k8s.apisix.apache.org/cors-allow-credential\":{},\"f:k8s.apisix.apache.org/cors-allow-headers\":{},\"f:k8s.apisix.apache.org/cors-allow-methods\":{},\"f:k8s.apisix.apache.org/cors-allow-origin\":{},\"f:k8s.apisix.apache.org/cors-expose-headers\":{},\"f:k8s.apisix.apache.org/enable-cors\":{},\"f:k8s.apisix.apache.org/enable-websocket\":{},\"f:k8s.apisix.apache.org/svc-namespace\":{},\"f:laf.dev/appid\":{},\"f:laf.dev/ingress.type\":{},\"f:nginx.ingress.kubernetes.io/cors-allow-credentials\":{},\"f:nginx.ingress.kubernetes.io/cors-allow-headers\":{},\"f:nginx.ingress.kubernetes.io/cors-allow-methods\":{},\"f:nginx.ingress.kubernetes.io/cors-allow-origin\":{},\"f:nginx.ingress.kubernetes.io/cors-expose-headers\":{},\"f:nginx.ingress.kubernetes.io/enable-cors\":{},\"f:nginx.ingress.kubernetes.io/server-snippet\":{}}},\"f:spec\":{\"f:ingressClassName\":{},\"f:rules\":{}}} }]},Spec:IngressSpec{DefaultBackend:nil,TLS:[]IngressTLS{},Rules:[]IngressRule{IngressRule{Host:dz24qv.laf.dev,IngressRuleValue:IngressRuleValue{HTTP:&HTTPIngressRuleValue{Paths:[]HTTPIngressPath{HTTPIngressPath{Path:/,Backend:IngressBackend{Resource:nil,Service:&IngressServiceBackend{Name:dz24qv,Port:ServiceBackendPort{Name:,Number:8000,},},},PathType:Prefix,},},},},},},IngressClassName:apisix,},Status:IngressStatus{LoadBalancer:{[]},},}"} 2023-10-27T11:11:29+08:00 error ingress/ingress.go:148 failed to translate ingress {"error": "endpoints: endpoints \"dz24qv\" not found", "ingress": {}} 2023-10-27T11:11:29+08:00 warn ingress/ingress.go:268 sync ingress failed, will retry {"object": {"Type":1,"Object":{"Key":"dz24qv/dz24qv","GroupVersion":"networking/v1","OldObject":null},"OldObject":null,"Tombstone":null}, "error": "endpoints: endpoints \"dz24qv\" not found"} 2023-10-27T11:11:29+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:30+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:31+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:31+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:31+08:00 error translation/translator.go:158 failed to translate ingress backend to upstream {"error": "service.spec.ports: port not defined", "ingress": "&Ingress{ObjectMeta:{6529093daa9e4f74966f5192 h7j5vb 5b043fd8-19fc-43b8-aa5c-be0d2b54933d 288674296 1 2023-10-13 17:09:18 +0800 HKT map[] map[k8s.apisix.apache.org/cors-allow-credential:false k8s.apisix.apache.org/cors-allow-headers: k8s.apisix.apache.org/cors-allow-methods: k8s.apisix.apache.org/cors-allow-origin: k8s.apisix.apache.org/cors-expose-headers: k8s.apisix.apache.org/enable-cors:true k8s.apisix.apache.org/svc-namespace:h7j5vb laf.dev/appid:h7j5vb laf.dev/bucket.name:h7j5vb-test laf.dev/ingress.type:bucket nginx.ingress.kubernetes.io/cors-allow-credentials:false nginx.ingress.kubernetes.io/cors-allow-headers:DNT,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,x-laf-develop-token,x-laf-func-data,x-amz-content-sha256,x-amz-security-token,x-amz-user-agent,x-amz-date nginx.ingress.kubernetes.io/cors-allow-methods: nginx.ingress.kubernetes.io/cors-allow-origin: nginx.ingress.kubernetes.io/cors-expose-headers: nginx.ingress.kubernetes.io/enable-cors:true] [] [] [{unknown Update networking.k8s.io/v1 2023-10-13 17:09:18 +0800 HKT FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:k8s.apisix.apache.org/cors-allow-credential\":{},\"f:k8s.apisix.apache.org/cors-allow-headers\":{},\"f:k8s.apisix.apache.org/cors-allow-methods\":{},\"f:k8s.apisix.apache.org/cors-allow-origin\":{},\"f:k8s.apisix.apache.org/cors-expose-headers\":{},\"f:k8s.apisix.apache.org/enable-cors\":{},\"f:k8s.apisix.apache.org/svc-namespace\":{},\"f:laf.dev/appid\":{},\"f:laf.dev/bucket.name\":{},\"f:laf.dev/ingress.type\":{},\"f:nginx.ingress.kubernetes.io/cors-allow-credentials\":{},\"f:nginx.ingress.kubernetes.io/cors-allow-headers\":{},\"f:nginx.ingress.kubernetes.io/cors-allow-methods\":{},\"f:nginx.ingress.kubernetes.io/cors-allow-origin\":{},\"f:nginx.ingress.kubernetes.io/cors-expose-headers\":{},\"f:nginx.ingress.kubernetes.io/enable-cors\":{}}},\"f:spec\":{\"f:ingressClassName\":{},\"f:rules\":{}}} }]},Spec:IngressSpec{DefaultBackend:nil,TLS:[]IngressTLS{},Rules:[]IngressRule{IngressRule{Host:oss.laf.dev,IngressRuleValue:IngressRuleValue{HTTP:&HTTPIngressRuleValue{Paths:[]HTTPIngressPath{HTTPIngressPath{Path:/h7j5vb-test,Backend:IngressBackend{Resource:nil,Service:&IngressServiceBackend{Name:h7j5vb,Port:ServiceBackendPort{Name:,Number:9000,},},},PathType:Prefix,},},},},},IngressRule{Host:h7j5vb-test.oss.laf.dev,IngressRuleValue:IngressRuleValue{HTTP:&HTTPIngressRuleValue{Paths:[]HTTPIngressPath{HTTPIngressPath{Path:/,Backend:IngressBackend{Resource:nil,Service:&IngressServiceBackend{Name:h7j5vb,Port:ServiceBackendPort{Name:,Number:9000,},},},PathType:Prefix,},},},},},},IngressClassName:*apisix,},Status:IngressStatus{LoadBalancer:{[]},},}"} 2023-10-27T11:11:31+08:00 error ingress/ingress.go:148 failed to translate ingress {"error": "service.spec.ports: port not defined", "ingress": {}} 2023-10-27T11:11:31+08:00 warn ingress/ingress.go:268 sync ingress failed, will retry {"object": {"Type":1,"Object":{"Key":"h7j5vb/6529093daa9e4f74966f5192","GroupVersion":"networking/v1","OldObject":null},"OldObject":null,"Tombstone":null}, "error": "service.spec.ports: port not defined"} 2023-10-27T11:11:31+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:32+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:32+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:33+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:34+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"} 2023-10-27T11:11:34+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"}

Steps to Reproduce

Only my cluster.

Environment

apisix-ingress-controller command not found in this docker image.

image
Sn0rt commented 8 months ago

thank your report

lingsamuel commented 8 months ago

what's your apisix-ingress-controller config?

Sn0rt commented 8 months ago
  1. ingress pod
apiVersion: apps/v1
kind: Deployment
metadata:
  name: apisix-ingress-controller
  namespace: ingress-apisix
  uid: c317229e-1f3a-4d85-b3ae-4be21d756ace
  resourceVersion: '216541530'
  generation: 1
  creationTimestamp: '2023-03-31T02:37:52Z'
  labels:
    app.kubernetes.io/instance: apisix
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-controller
    app.kubernetes.io/version: 1.6.0
    helm.sh/chart: ingress-controller-0.11.4
  annotations:
    deployment.kubernetes.io/revision: '1'
    meta.helm.sh/release-name: apisix
    meta.helm.sh/release-namespace: ingress-apisix
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: apisix
      app.kubernetes.io/name: ingress-controller
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: apisix
        app.kubernetes.io/name: ingress-controller
      annotations:
        checksum/config: dc595ec92c5fdc9f40170836cf8831cff9c2aeb820a6d590c02912d518747607
    spec:
      volumes:
        - name: configuration
          configMap:
            name: apisix-configmap
            items:
              - key: config.yaml
                path: config.yaml
            defaultMode: 420
      initContainers:
        - name: wait-apisix-admin
          image: busybox:1.28
          command:
            - sh
            - '-c'
            - >-
              until nc -z apisix-admin.ingress-apisix.svc.cluster.local 9180 ;
              do echo waiting for apisix-admin; sleep 2; done;
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext: {}
      containers:
        - name: ingress-controller
          image: apache/apisix-ingress-controller:1.6.0
          command:
            - /ingress-apisix/apisix-ingress-controller
            - ingress
            - '--config-path'
            - /ingress-apisix/conf/config.yaml
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          env:
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
          resources: {}
          volumeMounts:
            - name: configuration
              mountPath: /ingress-apisix/conf
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
              scheme: HTTP
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
              scheme: HTTP
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      serviceAccountName: apisix-ingress-controller
      serviceAccount: apisix-ingress-controller
      securityContext: {}
      schedulerName: default-scheduler
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
  1. configMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: apisix-configmap
  namespace: ingress-apisix
  uid: eae5df62-47b2-4a86-b1df-188834f1e397
  resourceVersion: '383912'
  creationTimestamp: '2023-03-31T02:37:52Z'
  labels:
    app.kubernetes.io/instance: apisix
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-controller
    app.kubernetes.io/version: 1.6.0
    helm.sh/chart: ingress-controller-0.11.4
  annotations:
    meta.helm.sh/release-name: apisix
    meta.helm.sh/release-namespace: ingress-apisix
data:
  config.yaml: |-
    # log options
    log_level: "info"
    log_output: "stderr"
    cert_file: "/etc/webhook/certs/cert.pem"
    key_file: "/etc/webhook/certs/key.pem"
    http_listen: ":8080"
    https_listen: ":8443"
    ingress_publish_service: ""
    enable_profiling: true
    apisix-resource-sync-interval: 1h
    kubernetes:
      kubeconfig: ""
      resync_interval: "6h"
      namespace_selector:
      - ""
      election_id: "ingress-apisix-leader"
      ingress_class: "apisix"
      ingress_version: "networking/v1"
      watch_endpointslices: false
      apisix_route_version: "apisix.apache.org/v2"
      enable_gateway_api: false
      apisix_version: "apisix.apache.org/v2"
      plugin_metadata_cm: ""
    apisix:
      admin_api_version: "v3"
      default_cluster_base_url: http://apisix-admin.ingress-apisix.svc.cluster.local:9180/apisix/admin
      default_cluster_admin_key: "xxxx---xxx-xx-x-x-x"
      default_cluster_name: "default"
lingsamuel commented 8 months ago

apisix-resource-sync-interval: 1h is too short for hundreds, even thousands of resources. I see hundreds of creation events in the 3-min log file. Since the events generated by full synchronization in 1.6 are also creation events, I can't tell if they are sync events or true resource creation events. If they are sync events, the resource sync interval should be increased. Otherwise, CP/DP isolated deployment mode is required.

Also, some auto-renew resources are not excluded correctly. For example, openebs.io-local raises an update event every 2s from log.

Sn0rt commented 8 months ago

@lingsamuel Do we have a technical solution to reduce the requests to the CP side when endpoint resources are updated?

For example, cache and other solutions?

Sn0rt commented 8 months ago

@maslow

I tried to reproduce this problem on the Alibaba Cloud ACK cluster, created 3500 APISIX-route, APISIX tls route and found no periodic high load.

shreemaan-abhishek commented 7 months ago

ping @maslow

github-actions[bot] commented 4 months ago

This issue has been marked as stale due to 90 days of inactivity. It will be closed in 30 days if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@apisix.apache.org list. Thank you for your contributions.

GhangZh commented 2 months ago

We also found the same problem, apisix-ingress-controller when comparing apisix crd declarative configurations with etcd data, requesting apisix would cause apisix cpu resources to go up, and then go down again after a while image

2024-04-26T19:34:15+08:00   warn    ingress/compare.go:186  pluginConfig: 799f92e3 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00   warn    ingress/compare.go:186  pluginConfig: 1668e488 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00   warn    ingress/compare.go:186  pluginConfig: 2059c8b3 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00   warn    ingress/compare.go:186  pluginConfig: 97d60823 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00   warn    ingress/compare.go:186  pluginConfig: 6d49039a in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00   warn    ingress/compare.go:186  pluginConfig: 47eee264 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00   warn    ingress/compare.go:186  pluginConfig: 16cc31 in APISIX but do not in declare yaml
GhangZh commented 2 months ago

We also found the same problem, apisix-ingress-controller when comparing apisix crd declarative configurations with etcd data, requesting apisix would cause apisix cpu resources to go up, and then go down again after a while image

2024-04-26T19:34:15+08:00 warn    ingress/compare.go:186  pluginConfig: 799f92e3 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn    ingress/compare.go:186  pluginConfig: 1668e488 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn    ingress/compare.go:186  pluginConfig: 2059c8b3 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn    ingress/compare.go:186  pluginConfig: 97d60823 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn    ingress/compare.go:186  pluginConfig: 6d49039a in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn    ingress/compare.go:186  pluginConfig: 47eee264 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn    ingress/compare.go:186  pluginConfig: 16cc31 in APISIX but do not in declare yaml

I'm experiencing the same scenario, please help look at it. Now as soon as I restart apisix-ingress-controller, apisix cpu goes up and then down again, about 5k+ apisixroutes and 5k+ upstreams @shreemaan-abhishek @Sn0rt apisix version: 2.13.0 apisix-ingress-controller version: 1.4.1