Kong / kong

🦍 The Cloud-Native API Gateway and AI Gateway.
https://konghq.com/install/#kong-community
Apache License 2.0
38.28k stars 4.75k forks source link

Kong DNS resolving not respecting properly configured overrides RES_OPTIONS and LOCALDOMAIN at times. #13301

Closed jeremyjpj0916 closed 1 day ago

jeremyjpj0916 commented 1 week ago

Is there an existing issue for this?

Kong version ($ kong version)

3.7.1

Current Behavior

When I start Kong in debug mode, I set a bunch of DNS override settings:

        - name: KONG_DNS_HOSTSFILE
          value: /etc/hosts
        - name: KONG_DNS_STALE_TTL
          value: "4"
        - name: KONG_DNS_NOT_FOUND_TTL
          value: "10"
        - name: KONG_DNS_RESOLVER
          value: 10.11.12.123,10.11.12.124
        - name: KONG_DNS_ERROR_TTL
          value: "1"
        - name: KONG_DNS_NO_SYNC
          value: "off"
        - name: LOCALDOMAIN
          value: .
        - name: RES_OPTIONS
          value: "ndots:1 attempts:1 timeout:1"
        - name: RES_TIMEOUT
          value: "1"
        - name: RES_RETRY
          value: "1"
        - name: RES_DFLRETRY
          value: "1"

And confirmed in pod at runtime:

ubuntu@kong-d4dfc6949-7wx2j:/$ echo $RES_OPTIONS
ndots:1 attempts:1 timeout:1

Per documentation: https://github.com/Kong/kong/blob/3.7.1/kong.conf.default#L1496

 # By default, the DNS resolver will use the standard configuration files
# `/etc/hosts` and `/etc/resolv.conf`. The settings in the latter file will be
# overridden by the environment variables `LOCALDOMAIN` and `RES_OPTIONS` if
# they have been set.

I am here to report that the RES_OPTIONS ENV variable override sometimes does not get used during Kong's DNS calls on startup as evidenced by the logs.

Portions of Startup logs below(these are good and look like the overrides are taking effect)

2024-06-25T17:12:36.677660238-05:00 nginx: [warn] load balancing method redefined in /usr/local/kong/nginx.conf:129
2024-06-25T17:12:36.793777887-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] globalpatches.lua:10: installing the globalpatches
2024-06-25T17:12:36.817968182-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:476: init(): [dns-client] (re)configuring dns client
2024-06-25T17:12:36.817991275-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:481: init(): [dns-client] staleTtl = 4
2024-06-25T17:12:36.817994732-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:485: init(): [dns-client] noSynchronisation = nil
2024-06-25T17:12:36.818220564-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:504: init(): [dns-client] query order = LAST, SRV, A, AAAA, CNAME
2024-06-25T17:12:36.818719698-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localnet = [fe00::0]
2024-06-25T17:12:36.818788948-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1
2024-06-25T17:12:36.818807272-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: localhost = [::1]
2024-06-25T17:12:36.818899455-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: kong-d4dfc6949-7wx2j = 1xx.xxx.xxx.xx
2024-06-25T17:12:36.819052671-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-mcastprefix = [fe00::0]
2024-06-25T17:12:36.819067299-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allrouters = [fe00::2]
2024-06-25T17:12:36.819119977-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localhost = [::1]
2024-06-25T17:12:36.819123865-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-loopback = [::1]
2024-06-25T17:12:36.819218181-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allnodes = [fe00::1]
2024-06-25T17:12:36.819226076-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:568: init(): [dns-client] validTtl = nil
2024-06-25T17:12:36.819283864-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:609: init(): [dns-client] nameserver 10.xx.x.10
2024-06-25T17:12:36.819303310-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:614: init(): [dns-client] attempts = 1
2024-06-25T17:12:36.819307899-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:621: init(): [dns-client] no_random = true
2024-06-25T17:12:36.819311937-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:631: init(): [dns-client] timeout = 1000 ms
2024-06-25T17:12:36.819315463-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:639: init(): [dns-client] ndots = 1
2024-06-25T17:12:36.819318679-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:641: init(): [dns-client] search = .
2024-06-25T17:12:36.819322286-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:653: init(): [dns-client] badTtl = 1 s
2024-06-25T17:12:36.819326243-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:655: init(): [dns-client] emptyTtl = 30 s
2024-06-25T17:12:36.932163645-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:476: init(): [dns-client] (re)configuring dns client
2024-06-25T17:12:36.932186578-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:481: init(): [dns-client] staleTtl = 4
2024-06-25T17:12:36.932189563-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:485: init(): [dns-client] noSynchronisation = false
2024-06-25T17:12:36.932252391-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:504: init(): [dns-client] query order = LAST, A, SRV, CNAME
2024-06-25T17:12:36.932431065-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localnet = [fe00::0]
2024-06-25T17:12:36.932434832-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1
2024-06-25T17:12:36.932440934-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: localhost = [::1]
2024-06-25T17:12:36.932443829-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: kong-d4dfc6949-7wx2j = 1xx.xxx.xxx.xx
2024-06-25T17:12:36.932539989-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-mcastprefix = [fe00::0]
2024-06-25T17:12:36.932583921-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allrouters = [fe00::2]
2024-06-25T17:12:36.932589712-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localhost = [::1]
2024-06-25T17:12:36.932593439-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-loopback = [::1]
2024-06-25T17:12:36.932625769-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allnodes = [fe00::1]
2024-06-25T17:12:36.932631731-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:568: init(): [dns-client] validTtl = nil
2024-06-25T17:12:36.932703385-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:609: init(): [dns-client] nameserver 10.11.12.123:53
2024-06-25T17:12:36.932708344-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:609: init(): [dns-client] nameserver 10.11.12.124:53
2024-06-25T17:12:36.932711430-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:614: init(): [dns-client] attempts = 1
2024-06-25T17:12:36.932713654-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:621: init(): [dns-client] no_random = true
2024-06-25T17:12:36.932716499-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:631: init(): [dns-client] timeout = 1000 ms
2024-06-25T17:12:36.932728361-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:639: init(): [dns-client] ndots = 1
2024-06-25T17:12:36.932730776-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:641: init(): [dns-client] search = .
2024-06-25T17:12:36.932732950-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:653: init(): [dns-client] badTtl = 1 s
2024-06-25T17:12:36.932735615-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:655: init(): [dns-client] emptyTtl = 10 s
2024-06-25T17:12:36.932752136-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] globalpatches.lua:441: randomseed(): seeding PRNG from OpenSSL RAND_bytes()

Now most of the settings look fine here, it duplicates the (re)configuring dns client block twice here, maybe thats because the diff between the two is the overriding the nameservers seen in resolve.conf to the KONG_DNS_RESOLVER specific ones. This block isn't really the issue but wanted to present it as a block that looks mostly correct as its reconfiguring the DNS client to have the right nameservers I assume(and ndots, attempts, timeout etc. all look like the override is right there).

My problem was as I kept watching the logs like a hawk I noticed the DNS warmup step come up later, which again when running Kong in debug mode tells me what its using for DNS settings here at the runtime, this time the output is not respecting the ENV overrides at all it seems other than the 2 nameserver override field(ndots wrong, search wrong, attempts wrong, timeout wrong). Looks like its falling back to settings from my resolv.conf that I don't want it to follow:

2024-06-25T17:12:37.945790382-05:00 2024/06/25 22:12:37 [notice] 47#0: *1315 [lua] warmup.lua:47: warming up DNS entries ..., context: ngx.timer
2024-06-25T17:12:37.946294565-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:476: init(): [dns-client] (re)configuring dns client
2024-06-25T17:12:37.946305346-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:481: init(): [dns-client] staleTtl = 4
2024-06-25T17:12:37.946308111-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:485: init(): [dns-client] noSynchronisation = false
2024-06-25T17:12:37.946531659-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:504: init(): [dns-client] query order = LAST, A, SRV, CNAME
2024-06-25T17:12:37.946659308-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localnet = [fe00::0]
2024-06-25T17:12:37.946664017-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1
2024-06-25T17:12:37.946684966-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: localhost = [::1]
2024-06-25T17:12:37.947015414-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: kong-d4dfc6949-7wx2j = 1xx.xxx.xxx.xx
2024-06-25T17:12:37.947029651-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-mcastprefix = [fe00::0]
2024-06-25T17:12:37.947033719-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allrouters = [fe00::2]
2024-06-25T17:12:37.947050460-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localhost = [::1]
2024-06-25T17:12:37.947054347-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-loopback = [::1]
2024-06-25T17:12:37.947056722-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allnodes = [fe00::1]
2024-06-25T17:12:37.947059527-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:568: init(): [dns-client] validTtl = nil
2024-06-25T17:12:37.947061721-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:609: init(): [dns-client] nameserver 10.11.12.123:53
2024-06-25T17:12:37.947064025-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:609: init(): [dns-client] nameserver 10.11.12.124:53
2024-06-25T17:12:37.947066570-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:614: init(): [dns-client] attempts = 5
2024-06-25T17:12:37.947068764-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:621: init(): [dns-client] no_random = true
2024-06-25T17:12:37.947070908-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:631: init(): [dns-client] timeout = 2000 ms
2024-06-25T17:12:37.947073323-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:639: init(): [dns-client] ndots = 5
2024-06-25T17:12:37.947076098-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:641: init(): [dns-client] search = my-gateway-dc.svc.cluster.local, svc.cluster.local, cluster.local, company.com
2024-06-25T17:12:37.947078222-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:653: init(): [dns-client] badTtl = 1 s
2024-06-25T17:12:37.947080526-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:655: init(): [dns-client] emptyTtl = 10 s

cc @chobits since we were talking DNS on that other issue.

Expected Behavior

I expect Kong to respect all ENV variable DNS overrides and maintain those settings persisting throughout Kong's runtime DNS behavior. That means respecting KONG_DNS_RESOLVER(seems it does), and LOCALDOMAIN and RES_OPTIONS which right now seems Kong does not in debug mode.

Steps To Reproduce

  1. Start kong 3.7.1 in debug with a bunch of DNS override settings with a resolve.conf present that has settings you don't want.
  2. Watch Kong's dns client in debug mode show you cases where its not using the ENV variable overrides in its DNS lookup settings.

Anything else?

No response

jeremyjpj0916 commented 1 week ago

And my resolv.conf default which seems to be what Kong ends up using during parts of the runtime when it should not:

ubuntu@kong-d4dfc6949-7wx2j:/$ cat /etc/resolv.conf
search my-gateway-dc.svc.cluster.local, svc.cluster.local, cluster.local, company.com
nameserver 10.xx.x.10
options ndots:5
ubuntu@kong-d4dfc6949-7wx2j:/$
chobits commented 1 week ago

I'll follow your steps to reproduce it, if finding something, I'll get back to you.

jeremyjpj0916 commented 1 week ago

@chobits sounds good, as long as you see in the warming up dns entries bit its not respecting the environment variable overrides w the dns client debug logs like shown above. Or if I am doing something wrong with the override ENV variables I am all ears there but I believe I used it correctly, as there are parts to Kong’s startup in dns logs that respected them(like the first portion of logs I included).

chobits commented 6 days ago

hi @jeremyjpj0916

reason for the problem

Portions of Startup logs below(these are good and look like the overrides are taking effect) ... 2024-06-25T17:12:36.932716499-05:00 2024/06/25 22:12:36 [debug] 30#0: [lua] client.lua:631: init(): [dns-client] timeout = 1000 ms ...

The first part of your provided log is generated by nginx master, see its pid 30.

My problem was as I kept watching the logs like a hawk I noticed the DNS warmup step come up later, which again when running Kong in debug mode tells me what its using for DNS settings here at the runtime, this time the output is not respecting the ENV overrides at all it seems other than the 2 nameserver override field(ndots wrong, search wrong, attempts wrong, timeout wrong). Looks like its falling back to settings from my resolv.conf that I don't want it to follow: ... 2024-06-25T17:12:37.947070908-05:00 2024/06/25 22:12:37 [debug] 47#0: *1318 [lua] client.lua:631: init(): [dns-client] timeout = 2000 ms ...

The second part of your provided log is generated by nginx worker, see its pid 47.

Why couldn't the nginx worker get DNS options from $RES_OPTION? It's because the nginx workers' environment variables are cleared before the nginx master starts them, see nginx env doc as following:

By default, nginx removes all environment variables inherited from its parent process except the TZ variable.

how to mitigate this problem temporarily?

configure your /etc/kong/kong.conf with the following options, let nginx workers inherit $RES_OPTIONS :

nginx_main_env = RES_OPTIONS

BTW, I recommend configuring /etc/resolv.conf directly. $RES_OPTIONS doesn't work for anyone right now, and using it is very rare. Trying to fix it might introduce more logic to the DNS library, adding unnecessary risk.

jeremyjpj0916 commented 6 days ago

@chobits in our k8s environment in my company seems like /etc/resolv.conf gets set by the platform on pod/container startup. Its nice to be able to see the difference in performance and behavior native with the companies provided resolv.conf vs the optimizations via the ENV variables. I am not even sure if I can override the resolv.conf easily via maybe a config map file mount to the container or if k8s would still override that file again on container start? Not totally sure but thats why I use the ENV variable overrides.

Would need to add:

        - name: KONG_NGINX_MAIN_ENV
          value: "RES_OPTIONS; env LOCALDOMAIN"

To accomplish the full DNS override settings as expected. Still seems a worthy bug to me or need to update documentation that Kong no longer supports DNS override settings. And Kong DNS lookups are super slow with our default DNS settings. Should not need to wait sometimes as much as 30-60-90 seconds for DNS cache warmups to be complete in our instances on Kong startup, taking way too long with our default configs and we have some users complaining that in the hotpath of Kong runtime when it does do DNS lookups its causing larger latency spikes.

jeremyjpj0916 commented 6 days ago

Wow, yep not running in debug mode just yet again, but just by the time in dev it went for DNS warmup times from 20-30 seconds down to 1.4 seconds. The power of DNS optimizations is very real:

2024-06-27T10:50:38.890120388-05:00 2024/06/27 15:50:38 [notice] 47#0: *1300 [lua] warmup.lua:47: warming up DNS entries ..., context: ngx.timer
2024-06-27T10:50:40.383691815-05:00 2024/06/27 15:50:40 [notice] 47#0: *1300 [lua] warmup.lua:81: finished warming up DNS entries' into the cache (in 1493ms), context: ngx.timer

Relevant logs of proof of wayyy better performance. Running in debug mode now just to capture the dns client debug logs to confirm the better settings.

jeremyjpj0916 commented 6 days ago

And lastly the debug logs post fix confirming the proper behavior carried through to the workers:

2024-06-27T10:55:35.642893875-05:00 2024/06/27 15:55:35 [notice] 48#0: *1300 [lua] warmup.lua:47: warming up DNS entries ..., context: ngx.timer
2024-06-27T10:55:35.643278754-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:476: init(): [dns-client] (re)configuring dns client
2024-06-27T10:55:35.643286248-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:481: init(): [dns-client] staleTtl = 4
2024-06-27T10:55:35.643288672-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:485: init(): [dns-client] noSynchronisation = false
2024-06-27T10:55:35.643477099-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:504: init(): [dns-client] query order = LAST, A, SRV, CNAME
2024-06-27T10:55:35.643636752-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-mcastprefix = [fe00::0]
2024-06-27T10:55:35.643640559-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: kong-5bdb5448fd-w4pgk = 1xx.xxx.xx.xx
2024-06-27T10:55:35.643643414-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localhost = [::1]
2024-06-27T10:55:35.643645418-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-loopback = [::1]
2024-06-27T10:55:35.643647512-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allnodes = [fe00::1]
2024-06-27T10:55:35.643667309-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localnet = [fe00::0]
2024-06-27T10:55:35.643698559-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allrouters = [fe00::2]
2024-06-27T10:55:35.643709449-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1
2024-06-27T10:55:35.643724528-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: localhost = [::1]
2024-06-27T10:55:35.643731210-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:568: init(): [dns-client] validTtl = nil
2024-06-27T10:55:35.643920859-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:609: init(): [dns-client] nameserver 10.xx.xx.xxx:53
2024-06-27T10:55:35.643932201-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:609: init(): [dns-client] nameserver 10.x.xxx.xxx:53
2024-06-27T10:55:35.643937110-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:614: init(): [dns-client] attempts = 1
2024-06-27T10:55:35.643939835-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:621: init(): [dns-client] no_random = true
2024-06-27T10:55:35.643941849-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:631: init(): [dns-client] timeout = 1000 ms
2024-06-27T10:55:35.643944134-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:639: init(): [dns-client] ndots = 1
2024-06-27T10:55:35.643946107-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:641: init(): [dns-client] search = .
2024-06-27T10:55:35.643948091-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:653: init(): [dns-client] badTtl = 1 s
2024-06-27T10:55:35.643950195-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:655: init(): [dns-client] emptyTtl = 10 s
2024-06-27T10:55:36.764782956-05:00 2024/06/27 15:55:36 [notice] 48#0: *1300 [lua] warmup.lua:81: finished warming up DNS entries' into the cache (in 1121ms), context: ngx.timer

@chobits The ENV variable patch approach worked as evidenced above.

jeremyjpj0916 commented 5 days ago

Even my stage environment with the fixes is super improved:

2024-06-27T12:04:07.707289028-05:00 2024/06/27 17:04:07 [notice] 48#0: *1992 [lua] warmup.lua:81: finished warming up DNS entries' into the cache (in 3802ms), context: ngx.timer

3.8 seconds when it used to take a full 1+ minute 🚀 . Thousands of service records needing FQDN resolved in those environments.

chobits commented 5 days ago

And lastly the debug logs post fix confirming the proper behavior carried through to the workers:

2024-06-27T10:55:35.642893875-05:00 2024/06/27 15:55:35 [notice] 48#0: *1300 [lua] warmup.lua:47: warming up DNS entries ..., context: ngx.timer
2024-06-27T10:55:35.643278754-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:476: init(): [dns-client] (re)configuring dns client
2024-06-27T10:55:35.643286248-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:481: init(): [dns-client] staleTtl = 4
2024-06-27T10:55:35.643288672-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:485: init(): [dns-client] noSynchronisation = false
2024-06-27T10:55:35.643477099-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:504: init(): [dns-client] query order = LAST, A, SRV, CNAME
2024-06-27T10:55:35.643636752-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-mcastprefix = [fe00::0]
2024-06-27T10:55:35.643640559-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: kong-5bdb5448fd-w4pgk = 1xx.xxx.xx.xx
2024-06-27T10:55:35.643643414-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localhost = [::1]
2024-06-27T10:55:35.643645418-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-loopback = [::1]
2024-06-27T10:55:35.643647512-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allnodes = [fe00::1]
2024-06-27T10:55:35.643667309-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localnet = [fe00::0]
2024-06-27T10:55:35.643698559-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allrouters = [fe00::2]
2024-06-27T10:55:35.643709449-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:544: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1
2024-06-27T10:55:35.643724528-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:559: init(): [dns-client] adding AAAA-record from 'hosts' file: localhost = [::1]
2024-06-27T10:55:35.643731210-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:568: init(): [dns-client] validTtl = nil
2024-06-27T10:55:35.643920859-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:609: init(): [dns-client] nameserver 10.xx.xx.xxx:53
2024-06-27T10:55:35.643932201-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:609: init(): [dns-client] nameserver 10.x.xxx.xxx:53
2024-06-27T10:55:35.643937110-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:614: init(): [dns-client] attempts = 1
2024-06-27T10:55:35.643939835-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:621: init(): [dns-client] no_random = true
2024-06-27T10:55:35.643941849-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:631: init(): [dns-client] timeout = 1000 ms
2024-06-27T10:55:35.643944134-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:639: init(): [dns-client] ndots = 1
2024-06-27T10:55:35.643946107-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:641: init(): [dns-client] search = .
2024-06-27T10:55:35.643948091-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:653: init(): [dns-client] badTtl = 1 s
2024-06-27T10:55:35.643950195-05:00 2024/06/27 15:55:35 [debug] 48#0: *1303 [lua] client.lua:655: init(): [dns-client] emptyTtl = 10 s
2024-06-27T10:55:36.764782956-05:00 2024/06/27 15:55:36 [notice] 48#0: *1300 [lua] warmup.lua:81: finished warming up DNS entries' into the cache (in 1121ms), context: ngx.timer

@chobits The ENV variable patch approach worked as evidenced above.

Cool! great to hear that the ENV method worked.

chobits commented 1 day ago

I'm closing this issue, feel free to reopen or left a comment if you still have the problem with this topic.

jeremyjpj0916 commented 1 day ago

@chobits Why would this be closed? Kong nginx workers do not respect the ENV variable settings mentioned in the conf ootb until Kong includes those ENV vars either in your default nginx template or in the conf itself?

https://github.com/Kong/kong/blob/3.7.1/kong.conf.default#L1496

chobits commented 1 day ago

@chobits Why would this be closed? Kong nginx workers do not respect the ENV variable settings mentioned in the conf ootb until Kong includes those ENV vars either in your default nginx template or in the conf itself?

I have filed an internal ticket to track this, currently I think we should modify the documentation to mention the ENV method. /KAG-4845

jeremyjpj0916 commented 1 day ago

@chobits ah gotcha, internal ticket and the option of updating documentation on how to achieve the settings seems reasonable too.