Open joran-fonjallaz opened 1 day ago
FYI: the "deadlock" is removed by restarting the pods manually
kubectl -n kong-dbless delete pods --selector=app.kubernetes.io/instance=kong-green
then kong KIC pods (controller and gateways) restart normally.
It seems your issue has been resolved. Feel free to reopen if you have any further concern.
thanks @StarlightIbuki for taking this issue. However, your answer doesn't help much. Could you please point me to the resolution ? How has this issue been solved ? And what's the fix ?
Thank you in advance
thanks @StarlightIbuki for taking this issue. However, your answer doesn't help much. Could you please point me to the resolution ? How has this issue been solved ? And what's the fix ?
Thank you in advance
Sorry I thought you had found the solution. @randmonkey Could you also take a look into this?
Is there an existing issue for this?
Kong version (
$ kong version
)3.8.0
Current Behavior
hello, We run kong KIC on GKE clusters: every night the preemptible nodes are reclaimed in our staging envs. And most of the time, it takes down all kong gateway pods (2 replicas) for hours.
versions
1.30.4-gke.1348000
0.14.1
,3.3.1
3.8.0
Additional info
It seems that the liveness probe is responding ok, while the readiness probe remains unhealthy, leading to the gateway pods to just remain around, not able to process traffic.
Error logs
The controller fails to talk to the gateways with
Kong finds itself in some sort of "deadlock" until the pods are deleted manually. Any insights ?
Below is the
values.yaml
file configuring kongExpected Behavior
kong gateway pods, either
bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
Steps To Reproduce
I could reproduce the error by killing the nodes (
kubectl delete nodes
) on which the kong pods were running. After killing the nodes, KIC fails to restart as it enters the deadlock situation described above. See screenshot:Anything else?
dump of a failing gateway pod:
kubectl describe
:and logs