Kong / kong

🦍 The Cloud-Native API Gateway and AI Gateway.
https://konghq.com/install/#kong-community
Apache License 2.0
39.03k stars 4.79k forks source link

DNS resolution failure issue with kong version 3.5.0 #13712

Open mayank-allen opened 2 weeks ago

mayank-allen commented 2 weeks ago

Is there an existing issue for this?

Kong version ($ kong version)

3.5.0

Current Behavior

We are intermittently getting DNS resolution failure. The Client gets this error- failed the initial dns/balancer resolve for 'classroom-service.classroom.svc.cluster.local' with: failed to create a resolver: failed to set peer name: resource temporarily unavailable In kong logs we get this kind of errors- 2024/09/26 07:03:28 [error] 1302#0: *1732918 [lua] init.lua:371: execute(): DNS resolution failed: failed to create a resolver: failed to set peer name: resource temporarily unavailable. Tried: ["(short)node-bff.node-bff.svc.cluster.local:(na) - cache-miss","node-bff.node-bff.svc.cluster.local.kong.svc.cluster.local:1 - cache-miss"], client: 10.1.6.200, server: kong, request: "POST /node-bff/api/v1/sr-management/requests/reissue-cards HTTP/1.1", host: "bff-dev.allen-demo.in", referrer: "http://localhost:3000/", request_id: "d4c524226b983517852ed6539f1c7311"

Expected Behavior

We expected that everything will work as in lower version 3.4.0

Steps To Reproduce

Just use version 3.5.0 and may be above (I only tested with 3.5.0) and deploy on your testing environment and let it remain at this version for sometime. In some hours you will start experiencing this issue with DNS not abled to get resolved, requests failing

Anything else?

No response

StarlightIbuki commented 2 days ago

@chobits Could you take a look?

chobits commented 2 days ago

It looks that your operating system resources have exausted then syscall socket's setpeername failed. like https://github.com/openresty/lua-resty-dns/issues/36

Error message reported by this line: https://github.com/openresty/lua-resty-dns/blob/master/lib/resty/dns/resolver.lua#L162

It might be resource limitations: The system may have reached its limit for open sockets or other resources.

Could you check the fds number in your OS/container and the limits like ulimit -a, note we need to know the kong process's limit info.