Kong / kong-operator

Kong Operator for Kubernetes and OpenShift
https://konghq.com
Apache License 2.0
58 stars 27 forks source link

Kong ingress readiness/liveness probes failed #72

Closed nautiam closed 2 years ago

nautiam commented 2 years ago

I run Kong example and got this error

Liveness probe failed: Get "http://:10254/healthz": dial tcp 10.131.0.185:10254: connect: connection refused
nautiam commented 2 years ago

I created a Network Policy to add Ingress rule to allow all pods in same namespace and it works.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: kong-policy
spec:
  podSelector: {}
  ingress:
    - from:
        - podSelector: {}
  policyTypes:
    - Ingress
rainest commented 2 years ago

What Kubernetes provider/distribution and cluster networking implementation do you use?

Do you normally have to do this for your liveness probes to work/do you need to do the same for other applications? We haven't had any other reports indicating this would need a NetworkPolicy, and it seems odd you'd need to allow Pod to Pod traffic--liveness checks are performed by the kubelet AFAIK, so I wouldn't expect them to be subject to that policy.

Do your logs (use -c ingress-controller) show any issue with the controller starting up? There are some operations that happen before the health server start, it may have just been that instead temporarily and resolved itself after a few restarts.

nautiam commented 2 years ago

What Kubernetes provider/distribution and cluster networking implementation do you use?

I got this error on Openshift 4.9.0. I don't get it in Openshift 4.8.x.

Do you normally have to do this for your liveness probes to work/do you need to do the same for other applications?

No, I don't. Other applications still work well with live probes. It only happens with kong-ingress-controller.

Do your logs (use -c ingress-controller) show any issue with the controller starting up? There are some operations that happen before the health server start, it may have just been that instead temporarily and resolved itself after a few restarts.

I checked the logs. There is no issue with the controller stating up. I leave it for a long time, it restarts about 10 times and still gets the error. When I add the Network Policy, all errors and events disappears, the ingress-controller works well since the first boot.

rainest commented 2 years ago

I'd advise checking with OpenShift support to see if they know what changed and whether these probes should require additional config and why. I don't see anything obvious in https://docs.openshift.com/container-platform/4.9/release_notes/ocp-4-9-release-notes.html#ocp-4-9-networking that would explain this, or any instructions for NetworkPolicy requirements in their probe documentation.

Do you have other NetworkPolicy resources? Do any of them look like they should handle probe traffic? Elsewhere I can find information that indicates that you do need to allow traffic from kubelets to Pods in policy to make probes work, e.g.

https://groups.google.com/g/kubernetes-users/c/hcokHjaA4mE https://stackoverflow.com/questions/64378694/kubernetes-health-checks-failing-with-network-policies-enabled

...though nothing specific to OpenShift. Misconfigured NetworkPolicy can definitely interfere with probes, but I'm unsure how that'd happen for a single Pod only, unless nothing else is using a network-based probe (unlikely).

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.