StackStorm / stackstorm-k8s

K8s Helm Chart that codifies StackStorm (aka "IFTTT for Ops" https://stackstorm.com/) Highly Availability fleet as a simple to use reproducible infrastructure-as-code app
https://helm.stackstorm.com/
Apache License 2.0
106 stars 106 forks source link

Stackstorm Reddis cashe not starting and so the rest of the application can not start #349

Closed philipphomberger closed 1 year ago

philipphomberger commented 1 year ago

Hi I try to install Stackstorm in a Kubernetes Cluster with helm. But stackstorm-ha-1673617629-redis-node-0 always stops with error that helth probe not working. And because of this the Stackstorm Container all say wait for db.

Thank you for your help :)

NAME READY STATUS RESTARTS AGE stackstorm-ha-1673615479-job-st2-apikey-load-xc4r4 0/1 Init:0/3 0 8m45s stackstorm-ha-1673617629-job-st2-apikey-load-bs4mp 0/1 Init:0/3 0 7m42s stackstorm-ha-1673617629-mongodb-0 1/1 Running 0 7m42s stackstorm-ha-1673617629-mongodb-1 1/1 Running 0 7m3s stackstorm-ha-1673617629-mongodb-2 1/1 Running 0 6m46s stackstorm-ha-1673617629-rabbitmq-0 1/1 Running 1 (2m39s ago) 7m42s stackstorm-ha-1673617629-rabbitmq-1 0/1 Running 0 2m17s stackstorm-ha-1673617629-redis-node-0 1/2 CrashLoopBackOff 5 (16s ago) 7m42s stackstorm-ha-1673617629-st2actionrunner-547d775cd7-8rxbv 0/1 Init:0/2 0 7m41s stackstorm-ha-1673617629-st2actionrunner-547d775cd7-gn79r 0/1 Init:0/2 0 7m41s stackstorm-ha-1673617629-st2actionrunner-547d775cd7-jmnx6 0/1 Init:0/2 0 7m41s stackstorm-ha-1673617629-st2actionrunner-547d775cd7-mzsh7 0/1 Init:0/2 0 7m41s stackstorm-ha-1673617629-st2actionrunner-547d775cd7-wfsbv 0/1 Init:0/2 0 7m41s stackstorm-ha-1673617629-st2api-f7b5cd9b9-68lsx 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2api-f7b5cd9b9-pkxjv 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2auth-54dd7c7685-lwq2m 0/1 Init:0/3 0 7m42s stackstorm-ha-1673617629-st2auth-54dd7c7685-s5g22 0/1 Init:0/3 0 7m42s stackstorm-ha-1673617629-st2client-74bfd87b6-z9hjl 1/1 Running 0 7m42s stackstorm-ha-1673617629-st2garbagecollector-998cb6856-5vnhc 0/1 Init:0/2 0 7m41s stackstorm-ha-1673617629-st2notifier-5c74474bb9-b8mt5 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2notifier-5c74474bb9-gqdtc 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2rulesengine-754bc4f6-7ztbg 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2rulesengine-754bc4f6-wc7cw 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2scheduler-5f49cc6f9f-h7j42 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2scheduler-5f49cc6f9f-ln96q 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2sensorcontainer-644c6c45c6-br5jn 0/1 Init:0/2 0 7m41s stackstorm-ha-1673617629-st2stream-7dbdd9fb96-4xt6d 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2stream-7dbdd9fb96-slb45 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2timersengine-d97864c9f-vsngg 0/1 Init:0/2 0 7m42s stackstorm-ha-1673617629-st2web-9ff6bfb85-9m5bn 0/1 CrashLoopBackOff 6 (41s ago) 7m42s stackstorm-ha-1673617629-st2web-9ff6bfb85-nddwl 0/1 CrashLoopBackOff 6 (44s ago) 7m42s stackstorm-ha-1673617629-st2workflowengine-7cbb59cc8f-sdqk8 0/1 Init:0/2 0 7m41s stackstorm-ha-1673617629-st2workflowengine-7cbb59cc8f-sx7qs 0/1 Init:0/2 0 7m41s

Logs: [eco_adm@cg3a54d9ac-k8m-s301 ~]$ kubectl logs stackstorm-ha-1673617629-redis-node-0 Defaulted container "redis" out of: redis, sentinel I am master redis 13:48:54.97 INFO ==> Starting Redis 1:C 13 Jan 2023 13:48:54.979 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1:C 13 Jan 2023 13:48:54.979 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=1, just started 1:C 13 Jan 2023 13:48:54.979 # Configuration loaded 1:M 13 Jan 2023 13:48:54.982 # Not listening to IPv6: unsupported 1:M 13 Jan 2023 13:48:54.983 Running mode=standalone, port=6379. 1:M 13 Jan 2023 13:48:54.983 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1:M 13 Jan 2023 13:48:54.983 # Server initialized 1:M 13 Jan 2023 13:48:54.983 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never'). 1:M 13 Jan 2023 13:48:54.983 Ready to accept connections

Describe Pod: [eco_adm@cg3a54d9ac-k8m-s301 ~]$ kubectl describe pod stackstorm-ha-1673617629-redis-node-0 Name: stackstorm-ha-1673617629-redis-node-0 Namespace: default Priority: 0 Service Account: default Node: cg3a54d9ac-k8w-s301.sys.schwarz/10.124.149.35 Start Time: Fri, 13 Jan 2023 14:47:46 +0100 Labels: app=redis chart=redis-12.3.2 controller-revision-hash=stackstorm-ha-1673617629-redis-node-7b59f7c97b release=stackstorm-ha-1673617629 role=node statefulset.kubernetes.io/pod-name=stackstorm-ha-1673617629-redis-node-0 Annotations: checksum/configmap: 1c645cdf9207e1338f57f37eb3767b6d4111c0ad08636426d8164736561f72a7 checksum/health: 6cbc0d3c356accbd47af7290e9e50ff762e7d0977f3c9c0fc93a5dd43654cbeb checksum/secret: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 cni.projectcalico.org/containerID: 8249d7addc2803f6246da9e1915b88a443da1cfaf3c99718e17316c751d056d5 cni.projectcalico.org/podIP: 192.168.177.217/32 cni.projectcalico.org/podIPs: 192.168.177.217/32 Status: Running IP: 192.168.177.217 IPs: IP: 192.168.177.217 Controlled By: StatefulSet/stackstorm-ha-1673617629-redis-node Containers: redis: Container ID: containerd://1fc1cdbca607a8e1e1e338eba4b09ec7cac6677bf25c4338f95d724354cbf05b Image: docker.io/bitnami/redis:6.0.9-debian-10-r66 Image ID: docker.io/bitnami/redis@sha256:63caa0cd4c73961ae1c2f322d8352510f61036d326147b341b09ca27b3eabd79 Port: 6379/TCP Host Port: 0/TCP Command: /bin/bash -c /opt/bitnami/scripts/start-scripts/start-node.sh State: Running Started: Fri, 13 Jan 2023 14:48:34 +0100 Ready: True Restart Count: 0 Liveness: exec [sh -c /health/ping_liveness_local.sh 5] delay=30s timeout=5s period=10s #success=1 #failure=5 Readiness: exec [sh -c /health/ping_readiness_local.sh 5] delay=5s timeout=10s period=10s #success=1 #failure=5 Environment: REDIS_MASTER_PORT_NUMBER: 6379 ALLOW_EMPTY_PASSWORD: yes REDIS_TLS_ENABLED: no REDIS_PORT: 6379 REDIS_DATA_DIR: /data Mounts: /data from redis-data (rw) /health from health (rw) /opt/bitnami/redis/etc from redis-tmp-conf (rw) /opt/bitnami/redis/mounted-etc from config (rw) /opt/bitnami/scripts/start-scripts from start-scripts (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dhg2f (ro) sentinel: Container ID: containerd://a58af9c647d5012736cdc7e16bde2dd04f365177df8917ee8ce64a065d882337 Image: docker.io/bitnami/redis-sentinel:6.0.9-debian-10-r66 Image ID: docker.io/bitnami/redis-sentinel@sha256:0b1a099fa2224096f42add10bce05e97935b44612ae20589c0ee5b16ec1d9dcc Port: 26379/TCP Host Port: 0/TCP Command: /bin/bash -c /opt/bitnami/scripts/start-scripts/start-sentinel.sh State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 137 Started: Fri, 13 Jan 2023 15:00:04 +0100 Finished: Fri, 13 Jan 2023 15:00:59 +0100 Ready: False Restart Count: 7 Liveness: exec [sh -c /health/ping_sentinel.sh 5] delay=5s timeout=5s period=5s #success=1 #failure=5 Readiness: exec [sh -c /health/ping_sentinel.sh 5] delay=5s timeout=1s period=5s #success=1 #failure=5 Environment: ALLOW_EMPTY_PASSWORD: yes REDIS_SENTINEL_TLS_ENABLED: no REDIS_SENTINEL_PORT: 26379 Mounts: /data from redis-data (rw) /health from health (rw) /opt/bitnami/redis-sentinel/etc from sentinel-tmp-conf (rw) /opt/bitnami/redis-sentinel/mounted-etc from config (rw) /opt/bitnami/scripts/start-scripts from start-scripts (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dhg2f (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: redis-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: redis-data-stackstorm-ha-1673617629-redis-node-0 ReadOnly: false start-scripts: Type: ConfigMap (a volume populated by a ConfigMap) Name: stackstorm-ha-1673617629-redis-scripts Optional: false health: Type: ConfigMap (a volume populated by a ConfigMap) Name: stackstorm-ha-1673617629-redis-health Optional: false config: Type: ConfigMap (a volume populated by a ConfigMap) Name: stackstorm-ha-1673617629-redis Optional: false sentinel-tmp-conf: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: redis-tmp-conf: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: kube-api-access-dhg2f: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 15m default-scheduler Successfully assigned default/stackstorm-ha-1673617629-redis-node-0 to cg3a54d9ac-k8w-s301.sys.schwarz Normal Pulling 15m kubelet Pulling image "docker.io/bitnami/redis:6.0.9-debian-10-r66" Normal Pulled 14m kubelet Successfully pulled image "docker.io/bitnami/redis:6.0.9-debian-10-r66" in 45.36777825s (45.367795273s including waiting) Normal Created 14m kubelet Created container redis Normal Started 14m kubelet Started container redis Normal Pulling 14m kubelet Pulling image "docker.io/bitnami/redis-sentinel:6.0.9-debian-10-r66" Normal Pulled 13m kubelet Successfully pulled image "docker.io/bitnami/redis-sentinel:6.0.9-debian-10-r66" in 41.417869521s (41.417875397s including waiting) Normal Created 13m kubelet Created container sentinel Normal Started 13m kubelet Started container sentinel Warning Unhealthy 13m (x5 over 13m) kubelet Liveness probe failed: Could not connect to Redis at localhost:26379: Connection refused Normal Killing 13m kubelet Container sentinel failed liveness probe, will be restarted Warning Unhealthy 10m (x47 over 13m) kubelet Readiness probe failed: Could not connect to Redis at localhost:26379: Connection refused Warning BackOff 6s (x33 over 8m11s) kubelet Back-off restarting failed container sentinel in pod stackstorm-ha-1673617629-redis-node-0_default(4bc531f0-9654-45fe-ba96-68e031024682)

philipphomberger commented 1 year ago

For the Container of Stackstorm Services it self: Events: Type Reason Age From Message


Normal Scheduled 5m25s default-scheduler Successfully assigned default/stackstorm-ha-1673617629-st2actionrunner-5489d74785-bj69h to cg3a54d9ac-k8w-s302.sys.schwarz Warning FailedMount 5m23s kubelet MountVolume.SetUp failed for volume "st2-post-start-script-vol" : failed to sync configmap cache: timed out waiting for the condition Warning FailedMount 5m22s (x2 over 5m23s) kubelet MountVolume.SetUp failed for volume "st2-encryption-key-vol" : failed to sync secret cache: timed out waiting for the condition Warning FailedMount 5m22s (x2 over 5m23s) kubelet MountVolume.SetUp failed for volume "st2-ssh-key-vol" : failed to sync secret cache: timed out waiting for the condition Normal Pulled 5m19s kubelet Container image "busybox:1.28" already present on machine Normal Created 5m19s kubelet Created container wait-for-db Normal Started 5m18s kubelet Started container wait-for-db [eco_adm@cg3a54d9ac-k8m-s301 linux-amd64]$

I have try to disable redis in the chart too and deploy it manual. But the Stackstorm apps not strating with that message thank you :)

philipphomberger commented 1 year ago

some more logs eco_adm@cg3a54d9ac-k8m-s301 linux-amd64]$ kubectl logs -l release=stackstorm-ha-1673617629 Defaulted container "st2-apikey-load" out of: st2-apikey-load, wait-for-db (init), wait-for-api (init), generate-st2client-config (init) Defaulted container "st2actionrunner" out of: st2actionrunner, wait-for-db (init), wait-for-queue (init) Defaulted container "st2actionrunner" out of: st2actionrunner, wait-for-db (init), wait-for-queue (init) Defaulted container "st2actionrunner" out of: st2actionrunner, wait-for-db (init), wait-for-queue (init) Defaulted container "st2actionrunner" out of: st2actionrunner, wait-for-db (init), wait-for-queue (init) Defaulted container "st2actionrunner" out of: st2actionrunner, wait-for-db (init), wait-for-queue (init) Defaulted container "st2api" out of: st2api, wait-for-db (init), wait-for-queue (init) Defaulted container "st2api" out of: st2api, wait-for-db (init), wait-for-queue (init) Defaulted container "st2auth" out of: st2auth, wait-for-db (init), wait-for-queue (init), generate-htpasswd (init) Defaulted container "st2auth" out of: st2auth, wait-for-db (init), wait-for-queue (init), generate-htpasswd (init) Defaulted container "st2client" out of: st2client, generate-st2client-config (init) Defaulted container "st2garbagecollector" out of: st2garbagecollector, wait-for-db (init), wait-for-queue (init) Defaulted container "st2notifier" out of: st2notifier, wait-for-db (init), wait-for-queue (init) Defaulted container "st2notifier" out of: st2notifier, wait-for-db (init), wait-for-queue (init) Defaulted container "st2rulesengine" out of: st2rulesengine, wait-for-db (init), wait-for-queue (init) Defaulted container "st2rulesengine" out of: st2rulesengine, wait-for-db (init), wait-for-queue (init) Defaulted container "st2scheduler" out of: st2scheduler, wait-for-db (init), wait-for-queue (init) Defaulted container "st2scheduler" out of: st2scheduler, wait-for-db (init), wait-for-queue (init) Defaulted container "st2sensorcontainer" out of: st2sensorcontainer, wait-for-db (init), wait-for-queue (init) Defaulted container "st2stream" out of: st2stream, wait-for-db (init), wait-for-queue (init) Defaulted container "st2stream" out of: st2stream, wait-for-db (init), wait-for-queue (init) Defaulted container "st2timersengine" out of: st2timersengine, wait-for-db (init), wait-for-queue (init) Defaulted container "st2workflowengine" out of: st2workflowengine, wait-for-db (init), wait-for-queue (init) Defaulted container "st2workflowengine" out of: st2workflowengine, wait-for-db (init), wait-for-queue (init) Error from server (BadRequest): container "st2actionrunner" in pod "stackstorm-ha-1673617629-st2actionrunner-5489d74785-p274w" is waiting to start: PodInitializing

philipphomberger commented 1 year ago

Was a network Problem in the calico network configuration.