artemiscloud / activemq-artemis-operator

Apache License 2.0
66 stars 62 forks source link

Why zombie processes are getting created by the readiness probe on the second instance of the statefullset configured as a standby/backup broker #793

Closed pshields2 closed 7 months ago

pshields2 commented 8 months ago

Describe the bug We are running a statefullset of two broker pods in a k8s cluster configured as a HA pair. The readinessProbe on the standby/backup statefullset instance (ss-1) is creating zombi processes, but the primary broker instance of the statefullset does not. The primary broker instance is "live" and the readinessprob produces a 2/2 READY state because it is the live broker so istio can route network traffic to it. The standby broker instance is in the "backup announced" state and its readiness probe does not detect TCP connections as it should, and reports a READY state of 1/2.

Output from the kubectl pod describe command shows an error on the standby pod instance of the statefullset (ss-1). Also note the (x2643 over 15h) which is the number of zombies being created over 15 hours. This will eventually fill the process table on our worker node that is running the standby instance broker pod.

Events:
  Type     Reason           Age                    From          Message
  ----     ------           ----                   ----          -------
  Warning  PolicyViolation  46m (x14 over 15h)     kyverno-scan  policy disallow-capabilities/adding-capabilities fail: Any capabilities added beyond the allowed list (AUDIT_WRITE, CHOWN, DAC_OVERRIDE, FOWNER, FSETID, KILL, MKNOD, NET_BIND_SERVICE, SETFCAP, SETGID, SETPCAP, SETUID, SYS_CHROOT) are disallowed.
  Warning  Unhealthy        111s (x2643 over 15h)  kubelet       Readiness probe failed:

The primary broker instance it does not have this error.

We edited the deployment and increased the readinessProbe timeoutSeconds=15 and the periodSeconds=30 on the recommendation of this RH article: https://access.redhat.com/solutions/6626611. But this only slowed the creation of the zombi processes.

This reference https://www.geeksforgeeks.org/zombie-processes-prevention/ explains zombi process creation on Linux.

I found these articles on the subject in regards in handling this in kubernetes by adding an minimal init process to the container to be a reaper of the zombi process. It seems that the broker process is started as pid 1 in the container but it does not act as an init process and cleanup the zombie process. This looks promising https://github.com/kubernetes/kubernetes/issues/81042 this was linked in the above https://github.com/Mirantis/cri-dockerd/issues/124 and this was linked in issue 124 https://github.com/containerd/containerd/issues/4255 Adding an init container to the pod https://github.com/kubernetes/kubernetes/issues/84210

A sumation from one of the above reference links: This part is interesting since we are seeing the artemis process have pid 1 inside the container. "I'm getting zombie processes occasionally as well. From recent reading I understand that it's the job of the initprocess to reap zombie processes after a failure of the parent process. In our case (in most cases in kubernetes?), the container's root process (PID 1) is not init - it's whatever we set as the entry command. I assume this means it's probably not looking for and handling reaping of zombie processes (unless Kubernetes has a way to handle zombie processes without using the init process as usual)

  1. If this is the case, then the questions are: Has this consequence been considered in the kubernetes design already?
  2. If there is no init process in a container, where does the responsibility lie for dealing with defunct processes? (Or does the current design simply mean that we should never exec in a container, because there is no way to reap defunct processes?)"

I did search all the issues and did not see a match to our condition. What can we do other than add tinit to the container to solve this zombie issue??

Also, we are using these docker images: quay.io/artemiscloud/activemq-artemis-operator:1.0.10 quay.io/artemiscloud/activemq-artemis-broker-init:1.0.14 quay.io/artemiscloud/activemq-artemis-broker-kubernetes:1.0.14

We do create a custom init container to add the HA configuration and jar files for extending the JAAS security model.

[!TIP] Vote this issue reacting with :+1: or :-1:

brusdev commented 8 months ago

@pshields2 can you share your ActiveMQArtemis CR?

pshields2 commented 8 months ago

Here you go:

---
apiVersion: broker.amq.io/v1beta1
kind: ActiveMQArtemis
metadata:
  name: {{ include "cray-dvs-mqtt.fullname" . }}
spec:
  deploymentPlan:
    size: {{ .Values.replicaCount }}
    persistenceEnabled: true
    messageMigration: true
    enableMetricsPlugin: true
    {{- if .Values.init.enabled }}
    initImage: {{ .Values.init.repository }}:{{ .Values.init.tag }}
    {{- end }}
    labels:
      app.kubernetes.io/name: {{ template "cray-dvs-mqtt.name" . }}
    podSecurity:
     serviceAccountName: {{ include "cray-dvs-mqtt.fullname" . }}
    # fsGroup 0 is a work around for a long standing bug when using persistence
    # https://github.com/artemiscloud/activemq-artemis-operator/issues/187
    podSecurityContext:
      fsGroup: 0
      fsGroupChangePolicy: "OnRootMismatch"
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: ActiveMQArtemis
                  operator: In
                  values: 
                  - cray-dvs-mqtt
              topologyKey: kubernetes.io/hostname
  env:
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: JWKS_SPIRE_URLS
      value: "{{ .Values.auth.jwks }}"
    - name: JWKS_AUDIENCE
      value: "{{ .Values.auth.audience }}"
    - name: JWKS_ISSUERS
      value: "{{ .Values.auth.issuers }}"
    - name: JWKS_CLOCK_SKEW_SECONDS
      value: "{{ .Values.auth.clock_skew }}"
    - name: JWKS_SPIRE_CONTACT_RATE
      value: "{{ .Values.auth.contact_rate }}"
    - name: JWT_XNAME_POLICY_ENABLED
      value: "{{ .Values.xnamePolicy.enabled }}"
    - name: LOG4J2_PROPERTIES
      value: "{{ .Values.logging.log4j2_properties }}"
    - name: ACTIVEMQ_LOGGING
      value: "{{ .Values.logging.activemq_logging }}"
  acceptors:
  - name: dvs
    protocols: mqtt
    port: 1883
    sslEnabled: false
pshields2 commented 8 months ago

We did some more testing and think we found the bug. The count argument to the readinessProbe.sh is not being properly past to or processed by the script.

[jboss@cray-dvs-mqtt-ss-1 ~]$ ps -elf | grep readiness | grep -v ‘grep’ | grep -v ‘defunct’ | cut -f 4
4 S jboss   569839    0 0 80  0 - 4780 do_wai 17:38 ?    00:00:00 /bin/sh /opt/amq/bin/readinessProbe.sh
[jboss@cray-dvs-mqtt-ss-1 ~]$ ps --ppid 569839 -o pid,ppid,cmd
  PID  PPID CMD
 570000 569839 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep

Also note in the output of the readinessProbe.sh, which is written to /tmp/readiness-log, that Count: 30! That should be Count: 1 if the cmd line argument of 1 is being properly passed to the script.

[jboss@cray-dvs-mqtt-ss-1 ~]$ cat /tmp/readiness-log
Count: 30, sleep: 1
Thu Feb  8 22:47:23 UTC 2024 Connect: 1
========================= OUTPUT ==========================
dvs value tcp://cray-dvs-mqtt-ss-1.cray-dvs-mqtt-hdls-svc.dvs.svc.cluster.local:1883?protocols=MQTT;tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;useEpoll=true;amqpCredits=1000;amqpMinCredits=300
dvs port 1883
    Nothing listening on port 1883, transport not yet running
scaleDown value tcp://cray-dvs-mqtt-ss-1.cray-dvs-mqtt-hdls-svc.dvs.svc.cluster.local:61616?protocols=CORE;tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;useEpoll=true;amqpCredits=1000;amqpMinCredits=300
scaleDown port 61616
    Nothing listening on port 61616, transport not yet running
========================= ERROR ==========================
==========================================================

We modified our deployment again to test this theory, but this time changed the timeoutSeconds and periodSeconds greater than the 1sec sleep times the 30 count.

   Readiness:      exec [/bin/bash -c /opt/amq/bin/readinessProbe.sh 1] delay=5s timeout=40s period=50s #success=1 #failure=3

There is still the warning about the readiness probe failing, but that is desirable for the standby broker instance which should not be receiving network connections. Also, we are not seeing zombie process being created anymore.

[jboss@cray-dvs-mqtt-ss-1 ~]$ ps -aef | grep defunct
jboss      41829   41636  0 22:59 pts/1    00:00:00 grep --color=auto defunct
[jboss@cray-dvs-mqtt-ss-1 ~]$ ps -aef | grep defunct
jboss      41864   41636  0 22:59 pts/1    00:00:00 grep --color=auto defunct
[jboss@cray-dvs-mqtt-ss-1 ~]$ 

We have been testing the newer version on the operator and associated images in the first steps for moving to a newer ActiveMQ Artemis version (currently 2.31.2) quay.io/artemiscloud/activemq-artemis-operator:1.0.16 quay.io/artemiscloud/activemq-artemis-broker-init:1.0.23 quay.io/artemiscloud/activemq-artemis-broker-kubernetes:1.0.23

I do not see the zombie creation issue on the standby broker pod. A quick diff of the current readinessProbe.sh and the newer on show a fair amount of change in the probe script.

 # diff readinessProbe.sh-new  readinessProbe.sh-old
1c1
< #!/usr/bin/env bash
---
> #!/bin/sh
19c19
< EVALUATE_SCRIPT=$(cat <<EOF
---
> EVALUATE_SCRIPT=`cat <<EOF
21c21
< from urllib.parse import urlsplit
---
> from urlparse import urlsplit
26,36c26,32
< tcp_lines = []
< for tcp_file_path in ["/proc/net/tcp", "/proc/net/tcp6"]:
<   try:
<     tcp_file = open(tcp_file_path, "r")
<     lines = tcp_file.readlines()
<     if len(lines) > 0:
<       header = lines.pop(0)
<       tcp_lines.extend(lines)
<     tcp_file.close()
<   except IOError:
<     print("Could not open {}".format(tcp_file_path))
---
> try:
>   tcp_file = open("/proc/net/tcp", "r")
> except IOError:
>   tcp_file = open("/proc/net/tcp6", "r")
> tcp_lines = tcp_file.readlines()
> header = tcp_lines.pop(0)
> tcp_file.close()
58,59c54,55
<   value = acceptor.text.strip()
<   print("{} value {}".format(name, value))
---
>   value = acceptor.text
>   print "{} value {}".format(name, value)
63c59
<   print("{} port {}".format(name, port))
---
>   print "{} port {}".format(name, port)
66,67c62,63
<     print("    {} does not define a port, cannot check acceptor".format(name))
<     result=1
---
>     print "    {} does not define a port, cannot check acceptor".format(name)
>     continue
71,73c67,69
<     print("    Transport is listening on port {}".format(port))
<   except ValueError as e:
<     print("    Nothing listening on port {}, transport not yet running".format(port))
---
>     print "    Transport is listening on port {}".format(port)
>   except ValueError, e:
>     print "    Nothing listening on port {}, transport not yet running".format(port)
76,77c72
< EOF
< )
---
> EOF`
100c95
<         python3 -c "$EVALUATE_SCRIPT" >"${OUTPUT}" 2>"${ERROR}"
---
>         python2 -c "$EVALUATE_SCRIPT" >"${OUTPUT}" 2>"${ERROR}"
122c117
<     COUNT=$((COUNT - 1))
---
>     COUNT=$(expr "$COUNT" - 1)
124c119
<         echo "${PROBE_MESSAGE}"
---
>         echo ${PROBE_MESSAGE}
brusdev commented 8 months ago

@pshields2 I confirm your analysis, this issue was fixed in the operator version 1.0.15 by the commit https://github.com/artemiscloud/activemq-artemis-operator/commit/603d8fff4bd26d519e844ce2a346432cb5cd63f3

pshields2 commented 8 months ago

Thanks for the confirmation. This gives us motivation for stepping up to the latest release.

pshields2 commented 7 months ago

I wanted to post a workaround that we tested. We edit the deployment and condense the script and its cmd line argument on the same line in exec command array. I believe that was the same approach taken in the fix.

    readinessProbe:
      exec:
        command:
          - /bin/bash
          - -c
          - "/opt/amq/bin/readinessProbe.sh 1"
brusdev commented 7 months ago

@pshields2 thanks for sharing your workaround, I confirm that it is the same approach taken in the fix.