Open tesshuflower opened 1 year ago
In this particular case, config was:
Liveness: http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
And from the events, the last "failure" event seems to be ~20 seconds in:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m39s default-scheduler Successfully assigned openshift-operators/volsync-controller-manager-676f6bfd-xz67p to ip-10-104-185-93.ap-southeast-2.compute.internal
Normal AddedInterface 5m38s multus Add eth0 [10.128.26.73/23] from openshift-sdn
Normal Pulled 5m38s kubelet Container image "registry.redhat.io/openshift4/ose-kube-rbac-proxy@sha256:6562088dcce7296d70990f52f2ee790c3df8694c937291536e974fe078fc4670" already present on machine
Normal Created 5m37s kubelet Created container kube-rbac-proxy
Normal Started 5m37s kubelet Started container kube-rbac-proxy
Normal Pulled 5m37s kubelet Container image "registry.redhat.io/rhacm2/volsync-rhel8@sha256:7207ea4de4a8bb3a2930b974c2122215cb902ab577e4ef1de6e635fd854b6d0a" already present on machine
Normal Created 5m37s kubelet Created container manager
Normal Started 5m37s kubelet Started container manager
Warning ProbeError 5m29s kubelet Readiness probe error: Get "http://10.128.26.73:8081/readyz": dial tcp 10.128.26.73:8081: connect: connection refused
Warning Unhealthy 5m29s kubelet Readiness probe failed: Get "http://10.128.26.73:8081/readyz": dial tcp 10.128.26.73:8081: connect: connection refused
Warning ProbeError 5m18s kubelet Liveness probe error: Get "http://10.128.26.73:8081/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 5m18s kubelet Liveness probe failed: Get "http://10.128.26.73:8081/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning ProbeError 5m18s kubelet Readiness probe error: Get "http://10.128.26.73:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 5m18s kubelet Readiness probe failed: Get "http://10.128.26.73:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Some more info - it seems that the liveness and readiness probes are setup and should become available before the k8s client cache is done or any leader election, so those are actually likely not the cause.
Note that we do check and install an scc at startup on OpenShift before the probes are setup so if that takes a while (api access is slow) it could delay the probes from becoming available.
Describe the bug
We may need to look into increasing the readiness probe delay to allow for a slower startup.
There was an issue where during the VolSync OLM operator startup, the following sequence happened:
InstallCheckFailedinstall failed: deployment volsync-controller-manager not ready before timeout: deployment "volsync-controller-manager" exceeded its progress deadline
Note the system may have a lot of resources (secrets or pvcs possibly) that caused it to use a lot of memory.
Possible causes of the delayed startup:
Steps to reproduce
Expected behavior
Actual results
Additional context