apache / openwhisk-deploy-kube

The Apache OpenWhisk Kubernetes Deployment repository supports deploying the Apache OpenWhisk system on Kubernetes and OpenShift clusters.
https://openwhisk.apache.org/
Apache License 2.0
301 stars 231 forks source link

No startup as zookeeper pods readiness probe failes #756

Closed Xnyle closed 1 year ago

Xnyle commented 1 year ago

Openwhisk doesn't start up for me, reason is that zookeepers readiness is always failing.

As long as it's failing, no DNS record for the headless service is being created and startup of other services never continues.

The way zookepers probe is currently defined it results in the following statement:

bash '-c' echo ruok | nc -w 1 localhost 2181 | grep imok

How is that supposed to even work?

bash '-c' 'echo ruok' | nc -w 1 localhost 2181 | grep imok

Would work, still ugly.

dgrove-oss commented 1 year ago

Note how it is defined in zookeeper-pod.yaml:

        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - "echo ruok | nc -w 1 localhost {{ .Values.zookeeper.port }} | grep imok"
Xnyle commented 1 year ago

Correct, but if i look at the actual deployment that gets installed it becomes

- /bin/bash
- '-c'
- echo ruok ...

So the " around the pipe is just yaml syntax, i guess the '-c' as well, so what actually gets executed is probably

bash -c echo ruok | nc -w 1 localhost 2181 | grep imok

And that can not work.

I just don't understand, why nobody else seems to have that problem

dgrove-oss commented 1 year ago

Each line (element in the array) is a single argument to the command. It's exactly the same way as in the first example here https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Xnyle commented 1 year ago

Except that's not what is actually happening, from the kubelet logs:

"ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = command error: EOF, stdout: , stderr: , exit code -1" cmd=[/bin/bash -c echo ruok | nc -w 1 localhost 2181 | grep imok]

pearPLUS commented 1 year ago

I have the same issue and the zookeep cannot start properly

Events: Type Reason Age From Message


Normal Scheduled 25m default-scheduler Successfully assigned openwhisk/owdev-zookeeper-0 to kevin-virtual-machine Normal Pulled 24m kubelet Container image "zookeeper:3.4" already present on machine Normal Created 24m kubelet Created container zookeeper Normal Started 24m kubelet Started container zookeeper Warning Unhealthy 19m (x2 over 19m) kubelet Liveness probe failed: dial tcp 10.244.93.40:2181: i/o timeout Warning Unhealthy 4m6s (x71 over 23m) kubelet Readiness probe failed: command "/bin/bash -c echo ruok | nc -w 1 localhost 2181 | grep imok" timed out

Xnyle commented 1 year ago

Kubelet logs is misleading as it seems to not include all quotation marks. So I guess the statement is correct.

"rpc error: code = Unknown desc = command error: EOF" is related to a cri-o conman communication bug/problem, so for me all probes were failing, Zookepers was just the first one. Ticket can be closed.

@pearPLUS your problem is a different one, probably network communication or the pod is unhealty for real.