KIC fails to start. All pods down: nginx [emerg] 1#0: bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)"

joran-fonjallaz commented 1 month ago

Is there an existing issue for this?

[X] I have searched the existing issues

Kong version (`$ kong version`)

3.8.0

Current Behavior

hello, We run kong KIC on GKE clusters: every night the preemptible nodes are reclaimed in our staging envs. And most of the time, it takes down all kong gateway pods (2 replicas) for hours.

versions

GKE control plane & node pools: 1.30.4-gke.1348000
kong ingress chart 0.14.1,
controller:3.3.1
gateway: 3.8.0

Additional info

db-less mode
using the Gateway API and Gateway resources only (e.g. HTTPRoutes)
no istio sidecars (they have been removed to try to narrow down the issue)

It seems that the liveness probe is responding ok, while the readiness probe remains unhealthy, leading to the gateway pods to just remain around, not able to process traffic.

Error logs

ERROR 2024/10/05 00:06:53 [emerg] 1#0: bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
ERROR nginx: [emerg] bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
[repeats over and over, yet the pod is not killed]

The controller fails to talk to the gateways with

ERROR 2024-10-07T03:37:26.870415241Z [resource.labels.containerName: ingress-controller] Error: could not retrieve Kong admin root(s): making HTTP request: Get "https://10.163.37.7:8444/": dial tcp 10.163.37.7:8444: connect: connection refused

Kong finds itself in some sort of "deadlock" until the pods are deleted manually. Any insights ?

Below is the values.yaml file configuring kong

ingress:
  deployment:
    test:
      enabled: false

  controller:
    enabled: true

    proxy:
      nameOverride: "{{ .Release.Name }}-gateway-proxy"

    postgresql:
      enabled: false

    env:
      database: "off"

    deployment:
      kong:
        enabled: false

    ingressController:
      enabled: true

      image:
        repository: kong/kubernetes-ingress-controller
        tag: "3.3.1"
        pullPolicy: IfNotPresent

      resources:
        requests:
          cpu: 100m
          memory: 256Mi
        limits:
          memory: 1G

      ingressClass: kong-green

      env:
        log_format: json
        log_level: error
        ingress_class: kong-green
        gateway_api_controller_name: konghq.com/kong-green

      gatewayDiscovery:
        enabled: true
        generateAdminApiService: true

    podAnnotations:
      sidecar.istio.io/inject: "false"

  gateway:
    enabled: true

    deployment:
      kong:
        enabled: true

    image:
      repository: kong
      tag: "3.8.0"
      pullPolicy: IfNotPresent

    resources:
      requests:
        cpu: 250m
        memory: 500Mi
      limits:
        memory: 2G

    replicaCount: 6

    podAnnotations:
      sidecar.istio.io/inject: "false"

    proxy:
      enabled: true
      type: ClusterIP
      annotations:
        konghq.com/protocol: "https"
        cloud.google.com/neg: '{"exposed_ports": {"80":{"name": "neg-kong-green"}}}'
      http:
        enabled: true
        servicePort: 80
        containerPort: 8000
        parameters: []
      tls:
        enabled: true
        servicePort: 443
        containerPort: 8443
        parameters:
          - http2
        appProtocol: ""

    ingressController:
      enabled: false

    postgresql:
      enabled: false

    env:
      role: traditional
      database: "off"
      proxy_access_log: "off"
#      proxy_error_log: "off"
      proxy_stream_access_log: "off"
#      proxy_stream_error_log: "off"
      admin_access_log: "off"
#      admin_error_log: "off"
      status_access_log: "off"
#      status_error_log: "off"
      log_level: warn
      headers: "off"
      request_debug: "off"

Expected Behavior

kong gateway pods, either

don't fail with the error above bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
or at least are able to recover from it by failing the liveness probe (or else)

Steps To Reproduce

I could reproduce the error by killing the nodes (kubectl delete nodes) on which the kong pods were running. After killing the nodes, KIC fails to restart as it enters the deadlock situation described above. See screenshot:

Anything else?

dump of a failing gateway pod: kubectl describe:

k -n kong-dbless describe po kong-green-gateway-68f467ff98-qztm5
Name:             kong-green-gateway-68f467ff98-qztm5
Namespace:        kong-dbless
Priority:         0
Service Account:  kong-green-gateway
Node:             ---
Start Time:       Mon, 07 Oct 2024 13:49:02 +0200
Labels:           app=kong-green-gateway
                  app.kubernetes.io/component=app
                  app.kubernetes.io/instance=kong-green
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=gateway
                  app.kubernetes.io/version=3.6
                  helm.sh/chart=gateway-2.41.1
                  pod-template-hash=68f467ff98
                  version=3.6
Annotations:      cni.projectcalico.org/containerID: 13864002653403e75b1ddb3ef661b5665f69e3b97c266b5833042f8dc4a4f39b
                  cni.projectcalico.org/podIP: 10.163.33.135/32
                  cni.projectcalico.org/podIPs: 10.163.33.135/32
                  kuma.io/gateway: enabled
                  kuma.io/service-account-token-volume: kong-green-gateway-token
                  sidecar.istio.io/inject: false
                  traffic.sidecar.istio.io/includeInboundPorts:
Status:           Running
IP:               10.163.33.135
IPs:
  IP:           10.163.33.135
Controlled By:  ReplicaSet/kong-green-gateway-68f467ff98
Init Containers:
  clear-stale-pid:
    Container ID:    containerd://ed0b35719cd87e11e849b42f20f1f328b1e2d63612d004b313ba981eda0bd790
    Image:           kong:3.8.0
    Image ID:        docker.io/library/kong@sha256:616b2ab5a4c7b6c14022e8a1495ff34930ced76f25f3d418e76758717fec335f
    Port:            <none>
    Host Port:       <none>
    SeccompProfile:  RuntimeDefault
    Command:
      rm
      -vrf
      $KONG_PREFIX/pids
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 07 Oct 2024 13:49:20 +0200
      Finished:     Mon, 07 Oct 2024 13:49:21 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  2G
    Requests:
      cpu:     250m
      memory:  500Mi
    Environment:
      KONG_ADMIN_ACCESS_LOG:         /dev/stdout
      KONG_ADMIN_ERROR_LOG:          /dev/stderr
      KONG_ADMIN_GUI_ACCESS_LOG:     /dev/stdout
      KONG_ADMIN_GUI_ERROR_LOG:      /dev/stderr
      KONG_ADMIN_LISTEN:             0.0.0.0:8444 http2 ssl, [::]:8444 http2 ssl
      KONG_CLUSTER_LISTEN:           off
      KONG_DATABASE:                 off
      KONG_LUA_PACKAGE_PATH:         /opt/?.lua;/opt/?/init.lua;;
      KONG_NGINX_WORKER_PROCESSES:   2
      KONG_PLUGINS:                  ---
      KONG_PORTAL_API_ACCESS_LOG:    /dev/stdout
      KONG_PORTAL_API_ERROR_LOG:     /dev/stderr
      KONG_PORT_MAPS:                80:8000, 443:8443
      KONG_PREFIX:                   /kong_prefix/
      KONG_PROXY_ACCESS_LOG:         /dev/stdout
      KONG_PROXY_ERROR_LOG:          /dev/stderr
      KONG_PROXY_LISTEN:             0.0.0.0:8000, [::]:8000, 0.0.0.0:8443 http2 ssl, [::]:8443 http2 ssl
      KONG_PROXY_STREAM_ACCESS_LOG:  /dev/stdout basic
      KONG_PROXY_STREAM_ERROR_LOG:   /dev/stderr
      KONG_ROLE:                     traditional
      KONG_ROUTER_FLAVOR:            traditional
      KONG_STATUS_ACCESS_LOG:        off
      KONG_STATUS_ERROR_LOG:         /dev/stderr
      KONG_STATUS_LISTEN:            0.0.0.0:8100, [::]:8100
      KONG_STREAM_LISTEN:            off
    Mounts:
      /kong_prefix/ from kong-green-gateway-prefix-dir (rw)
      /opt/kong/plugins/---
      /opt/kong/plugins/---
      /opt/kong/plugins/---
      /opt/kong/plugins/---
      /opt/kong/plugins/---
      /tmp from kong-green-gateway-tmp (rw)
Containers:
  proxy:
    Container ID:    containerd://0ed944478d25423c08c85146ed1528ae668d128f13bddaf6402990701e2ea3a1
    Image:           kong:3.8.0
    Image ID:        docker.io/library/kong@sha256:616b2ab5a4c7b6c14022e8a1495ff34930ced76f25f3d418e76758717fec335f
    Ports:           8444/TCP, 8000/TCP, 8443/TCP, 8100/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    State:           Waiting
      Reason:        CrashLoopBackOff
    Last State:      Terminated
      Reason:        Error
      Exit Code:     1
      Started:       Mon, 07 Oct 2024 13:59:39 +0200
      Finished:      Mon, 07 Oct 2024 13:59:49 +0200
    Ready:           False
    Restart Count:   7
    Limits:
      memory:  2G
    Requests:
      cpu:      250m
      memory:   500Mi
    Liveness:   http-get http://:status/status delay=5s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:status/status/ready delay=5s timeout=5s period=10s #success=1 #failure=3
    Environment:
      KONG_ADMIN_ACCESS_LOG:         /dev/stdout
      KONG_ADMIN_ERROR_LOG:          /dev/stderr
      KONG_ADMIN_GUI_ACCESS_LOG:     /dev/stdout
      KONG_ADMIN_GUI_ERROR_LOG:      /dev/stderr
      KONG_ADMIN_LISTEN:             0.0.0.0:8444 http2 ssl, [::]:8444 http2 ssl
      KONG_CLUSTER_LISTEN:           off
      KONG_DATABASE:                 off
      KONG_LUA_PACKAGE_PATH:         /opt/?.lua;/opt/?/init.lua;;
      KONG_NGINX_WORKER_PROCESSES:   2
      KONG_PLUGINS:                 ---
      KONG_PORTAL_API_ACCESS_LOG:    /dev/stdout
      KONG_PORTAL_API_ERROR_LOG:     /dev/stderr
      KONG_PORT_MAPS:                80:8000, 443:8443
      KONG_PREFIX:                   /kong_prefix/
      KONG_PROXY_ACCESS_LOG:         /dev/stdout
      KONG_PROXY_ERROR_LOG:          /dev/stderr
      KONG_PROXY_LISTEN:             0.0.0.0:8000, [::]:8000, 0.0.0.0:8443 http2 ssl, [::]:8443 http2 ssl
      KONG_PROXY_STREAM_ACCESS_LOG:  /dev/stdout basic
      KONG_PROXY_STREAM_ERROR_LOG:   /dev/stderr
      KONG_ROLE:                     traditional
      KONG_ROUTER_FLAVOR:            traditional
      KONG_STATUS_ACCESS_LOG:        off
      KONG_STATUS_ERROR_LOG:         /dev/stderr
      KONG_STATUS_LISTEN:            0.0.0.0:8100, [::]:8100
      KONG_STREAM_LISTEN:            off
      KONG_NGINX_DAEMON:             off
    Mounts:
      /kong_prefix/ from kong-green-gateway-prefix-dir (rw)
      /opt/kong/plugins/---
      /opt/kong/plugins/---
      /opt/kong/plugins/---
      /opt/kong/plugins/---
      /opt/kong/plugins/---
      /tmp from kong-green-gateway-tmp (rw)
Readiness Gates:
  Type                                       Status
  cloud.google.com/load-balancer-neg-ready   True
Conditions:
  Type                                       Status
  cloud.google.com/load-balancer-neg-ready   True
  PodReadyToStartContainers                  True
  Initialized                                True
  Ready                                      False
  ContainersReady                            False
  PodScheduled                               True
Volumes:
  kong-green-gateway-prefix-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  256Mi
  kong-green-gateway-tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  1Gi
  kong-green-gateway-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                             Age                    From                                   Message
  ----     ------                             ----                   ----                                   -------
  Normal   Killing                            11m                    kubelet                                Container proxy failed liveness probe, will be restarted
  Warning  FailedPreStopHook                  10m                    kubelet                                PreStopHook failed
  Normal   Pulled                             10m (x2 over 11m)      kubelet                                Container image "kong:3.8.0" already present on machine
  Normal   Created                            10m (x2 over 11m)      kubelet                                Created container proxy
  Normal   Started                            10m (x2 over 11m)      kubelet                                Started container proxy
  Warning  Unhealthy                          10m (x4 over 11m)      kubelet                                Liveness probe failed: Get "http://10.163.33.135:8100/status": dial tcp 10.163.33.135:8100: connect: connection refused
  Warning  Unhealthy                          10m (x9 over 11m)      kubelet                                Readiness probe failed: Get "http://10.163.33.135:8100/status/ready": dial tcp 10.163.33.135:8100: connect: connection refused
  Warning  BackOff                            114s (x26 over 7m19s)  kubelet                                Back-off restarting failed container proxy in pod kong-green-gateway-68f467ff98-qztm5_kong-dbless(ab152a94-7ef0-4de0-b84c-1eb419327b88)

and logs

2024/10/07 12:05:10 [warn] 1#0: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /kong_prefix/nginx.conf:7
nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /kong_prefix/nginx.conf:7
2024/10/07 12:05:14 [notice] 1#0: [lua] init.lua:791: init(): [request-debug] token for request debugging: ccbb05a0-6e76-4cb7-9e5d-346690a3c69f
2024/10/07 12:05:14 [emerg] 1#0: bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
2024/10/07 12:05:14 [notice] 1#0: try again to bind() after 500ms
2024/10/07 12:05:14 [emerg] 1#0: bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
2024/10/07 12:05:14 [notice] 1#0: try again to bind() after 500ms
2024/10/07 12:05:14 [emerg] 1#0: bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
2024/10/07 12:05:14 [notice] 1#0: try again to bind() after 500ms
2024/10/07 12:05:14 [emerg] 1#0: bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
2024/10/07 12:05:14 [notice] 1#0: try again to bind() after 500ms
2024/10/07 12:05:14 [emerg] 1#0: bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
2024/10/07 12:05:14 [notice] 1#0: try again to bind() after 500ms
2024/10/07 12:05:14 [emerg] 1#0: still could not bind()
nginx: [emerg] still could not bind()

joran-fonjallaz commented 1 month ago

FYI: the "deadlock" is removed by restarting the pods manually

kubectl -n kong-dbless delete pods --selector=app.kubernetes.io/instance=kong-green

then kong KIC pods (controller and gateways) restart normally.

StarlightIbuki commented 1 month ago

It seems your issue has been resolved. Feel free to reopen if you have any further concern.

joran-fonjallaz commented 1 month ago

thanks @StarlightIbuki for taking this issue. However, your answer doesn't help much. Could you please point me to the resolution ? How has this issue been solved ? And what's the fix ?

Thank you in advance

StarlightIbuki commented 1 month ago

thanks @StarlightIbuki for taking this issue. However, your answer doesn't help much. Could you please point me to the resolution ? How has this issue been solved ? And what's the fix ?

Thank you in advance

Sorry I thought you had found the solution. @randmonkey Could you also take a look into this?

joran-fonjallaz commented 1 month ago

hi @randmonkey,

we are getting the issue above multiple times per days and it's getting very frustrating. Do you have any insights to share ?

On my side, I've also been searching for solutions. And a closer look at the clear-stale-pid initContainer might reveal a bug: it seems that there are 2 // in the rm command (see screenshot below)

behaviour wise, the liveness probe is failing, which only restarts the container. Restarting the container doesn't help. Kong is able to start only when the pod is deleted (manually), which leads me towards cleaning up the PID

joran-fonjallaz commented 1 month ago

the issue seems to be the same as https://github.com/Kong/kubernetes-ingress-controller/issues/5324

I have the following hypothesis on what is happening

a gateway pod starts (fresh new pod)
the controller fails to push the config (due to discovery and potentially DNS issues in GKE)
the gateway pods is therefore not healthy
the liveness probe fails
GKE restarts the container
the PID is not cleaned (or whatever else)
the container fails with "address already in use"
the container never recovers, less killed manually with a kubectl delete

randmonkey commented 1 month ago

The /kong_prefix/sockets/we is the path of the socket for worker events. The old socket may not get cleared because the clear-stale-pid does not touch the path /kong_prefix/sockets/ .

a gateway pod starts (fresh new pod) the controller fails to push the config (due to discovery and potentially DNS issues in GKE) the gateway pods is therefore not healthy the liveness probe fails GKE restarts the container the PID is not cleaned (or whatever else) the container fails with "address already in use" the container never recovers, less killed manually with a kubectl delete

5-8 would be the possible reason of the issue. For 1-4, KIC failing to push the config will not make the liveness probe fail and then restart the gateway pod.

flrgh commented 1 month ago

:wave: I think I have some insight on this. In 3.8, we relocated Kong's internal sockets into a subdirectory in the prefix tree (#13409).

There is some code that runs as part of kong start that cleans up dangling sockets that might be left over from an unclean shutdown of Kong.

This logic is unfortunately duplicated in our docker entrypoint script because it circumvents kong start and invokes nginx directly.

The docker entrypoint code was not updated to point to the new socket directory that Kong is using as of 3.8 (an oversight). I've opened a PR to remedy this, which I think should resolve the issue.

For those using the clear-stale-pid init container pattern that I see in some of the comments, it can be updated to remove $KONG_PREFIX/sockets (in addition to $KONG_PREFIX/pids) to mitigate the issue in the meantime.* The docker entrypoint runs kong prepare, so it will recreate the sockets directory as needed.

*In fact, enabling this kind of ops pattern in 3.8 was part of the underlying intent of #13409: establishing more segregation between persistent and transient data so that lifecycle management doesn't require non-trivial amounts of scripting (like what is found in the aforementioned docker entrypoint).

joran-fonjallaz commented 3 weeks ago

hello @flrgh , thanks for looking into it. In my KIC helm chart, I added the following

ingress:
  controller:

    # controller config

  gateway:
    enabled: true

    deployment:
      kong:
        enabled: true
      initContainers:
        - command:
            - rm
            - '-vrf'
            - ${KONG_PREFIX}/sockets
          env:
            - name: KONG_PREFIX
              value: /kong_prefix/
          image: kong:3.8.0
          imagePullPolicy: IfNotPresent
          name: clear-stale-pid-custom
          volumeMounts:
            - mountPath: /kong_prefix/
              name: kong-green-gateway-prefix-dir

when our preemptible node got "restarted" just a few minutes ago. Kong was not able to restart properly and crash again with the errors

nginx: [emerg] bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
nginx: [emerg] still could not bind()

flrgh commented 3 weeks ago

@joran-fonjallaz that's odd.

My k8s knowledge is minimal, so bear with me a little. If /kong_prefix/sockets still exists and contains sockets when starting the primary Kong container, I feel like these are the probable causes to investigate:

The Kong container is started before the init container has completed
The init container was not actually executed
The init container encountered an error
The volume that is mounted in the init container (kong-green-gateway-prefix-dir) is not the same as the one that is mounted in the Kong container
Something else (another container?) is creating socket files in kong-green-gateway-prefix-dir after the init container but before the Kong container

joran-fonjallaz commented 2 weeks ago

hello @flrgh, not sure about the above although the list does seem exhaustive. I don't actually have a ton of time to dedicate to this issue at the moment. However, reverting the gateway to 3.7 seems to have solve the issue: we didn't got the error above (98: Address already in use) since Friday. Whereas it would occur at least once a day with 3.8.

So your feeling that the issue might be linked to 3.8 does seem correct

Kong / kong