Closed abustya closed 6 years ago
Pinging @Tieske
@abustya dns server error: 3 name error
means the server did send an answer, but the answer was either empty or didn't contain the requested name. dns server error: 2 server failure
indicates that your dns server ran into an error.
From the logs it appears as if this only happens when looking up the postgres database.
resolv.conf
file you have on your system?10.1.0.5
and 168.63.129.16
), does the error persist if you disable either one of those?In all honesty, looks like a problem with your nameservers.
Yes, the last line in the log: name resolution failed for 'exchange-api': dns server error: 3 name error
. This is one of the backend api-s configured to be proxied.
None of the dns_* properties are customized. (Actually, I don't even have a kong.conf file, only the kong.conf.default, untouched.)
Env vars:
KONG_DATABASE=postgres
KONG_LUA_SSL_TRUSTED_CERTIFICATE=/etc/pki/tls/certs/ca-bundle.crt
KONG_LUA_SSL_VERIFY_DEPTH=3
KONG_PG_HOST=kong-database
KONG_PG_PASSWORD=kong
KONG_PG_USER=kong
resolv.conf contents:
search ci.svc.cluster.local svc.cluster.local cluster.local ua5hp3m0b0butcqmu5iwql5ykd.ax.internal.cloudapp.net
nameserver 10.1.0.5
nameserver 168.63.129.16
options ndots:5
The first nameserver is responsible for resolution of domains inside the cluster, the secord for outside. If I only leave the one for inside resolution, the error no longer occurs. If I only leave the one for outside, the error occurs constantly.
Only leaving the fist nameserver actually seems like a viable workaround at the moment, though I think later I will also need to add api-s from outsige the cluster.
You should reconfigure your dns, this will never work.
Actually it appears because of a bug in the dns resolver, if that bug wouldn't have been around, it might have worked, but only because of retries being done. So it would only have masked the bad configuration, and you'd have very high dns resolution latency.
The dns client will randomly pick a dns server to resolve names (to spread the load), so in cases where it picks your "outside" server, it will obviously fail to resolve internal names, because they are unknown at that server.
You should always use the internal server, and configure that server to lookup on your external server (in a chained fashion).
Closing this now. If you think this is not resolved, then please feel free to reopen.
Summary
Opening a new issue after this one has been closed: https://github.com/Kong/kong/issues/2524
I am running kong in an OpenShift cluster, and I am still encountering random DNS resolution errors with version 0.11.1.
Steps To Reproduce
Error occurs when calling either the admin api (e.g. '/apis') or one of the proxied apis.
Additional Details & Logs
Error occurs for about 2% of calls. When running lots of calls subsequently, I see that the errors mostly occur in batches: for about 0.5-1 second all the calls fail, and then all is well again.