containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.74k stars 2.41k forks source link

dns resolver doesn't work for nginx #14356

Closed heidricha closed 2 years ago

heidricha commented 2 years ago

/kind bug

Description

Steps to reproduce the issue:

  1. create two containers in the same network, at least one of them is nginx

  2. check DNS name resolver in the containers

  3. use the checked resolver IP address and DNS names in "resolver" and "set" option in nginx.conf for proxy_pass

Describe the results you received: DNS works, I can use commands like ping, getent, nslookup, dig. Everything looks ok, still nginx fails with error:

[emerg] 1#1: host not found in resolver "set" in /etc/nginx/conf.d/default.conf:4

Describe the results you expected: it should work, as it works with docker, where resolver is 127.11

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Client:       Podman Engine
Version:      4.1.1-dev
API Version:  4.1.1-dev
Go Version:   devel go1.19-016d755213 Thu May 12 22:32:42 2022 +0000
Git Commit:   12d30e63f055d904277c647a4ddbf406e28883f9
Built:        Thu May 19 12:56:28 2022
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: Unknown
    path: /usr/local/libexec/podman/conmon
    version: 'conmon version 2.1.1, commit: 546e4dbe78c72754053d4029593e7d9fc59adff1'
  cpuUtilization:
    idlePercent: 95.28
    systemPercent: 1.86
    userPercent: 2.85
  cpus: 8
  distribution:
    codename: bullseye
    distribution: debian
    version: "11"
  eventLogger: file
  hostname: prod-test
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.10.0-10-amd64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 3195027456
  memTotal: 33671778304
  networkBackend: cni
  ociRuntime:
    name: crun
    package: crun_0.17+dfsg-1_amd64
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: ""
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.0.1-2_amd64
    version: |-
      slirp4netns version 1.0.1
      commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
      libslirp: 4.4.0
  swapFree: 15922728960
  swapTotal: 15997071360
  uptime: 1534h 39m 28.06s (Approximately 63.92 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  gitlab:8012:
    Blocked: false
    Insecure: true
    Location: gitlab:8012
    MirrorByDigestOnly: false
    Mirrors: []
    Prefix: gitlab:8012
    PullFromMirror: ""
  gitlab:8013:
    Blocked: false
    Insecure: true
    Location: gitlab:8013
    MirrorByDigestOnly: false
    Mirrors: []
    Prefix: gitlab:8013
    PullFromMirror: ""
  search:
  - docker.io
  - gitlab:8013
  - gcr.io
  - gitlab:8012
store:
  configFile: /home/sagemcom/.config/containers/storage.conf
  containerStore:
    number: 19
    paused: 0
    running: 18
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/sagemcom/.local/share/containers/storage
  graphRootAllocated: 152278147072
  graphRootUsed: 79715364864
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 47
  runRoot: /run/user/1000/containers
  volumePath: /home/sagemcom/.local/share/containers/storage/volumes
version:
  APIVersion: 4.1.1-dev
  Built: 1652957788
  BuiltTime: Thu May 19 12:56:28 2022
  GitCommit: 12d30e63f055d904277c647a4ddbf406e28883f9
  GoVersion: devel go1.19-016d755213 Thu May 12 22:32:42 2022 +0000
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.1-dev

Package info (e.g. output of rpm -q podman or apt list podman):

(paste your output here)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

I use docker-compose with podman uds

I use external network to be able to provide the gateway (DNS) info for the containers:

$ cat docker-compose.yaml 
services:
  front:
    image: nginx:stable-alpine
    dns_search: dns.podman
    environment:
      DNS_SUFFIX: .dns.podman
      GATEWAY: 10.89.4.1
    ports:
      - 8081:80

  back:
    image: nginx:stable-alpine
    dns_search: dns.podman
    environment:
      DNS_SUFFIX: .dns.podman
      GATEWAY: 10.89.4.1
    ports:
      - 8082:80

networks:
  default:
    name: test
    external: true

$ podman exec -it test_back_1 ash
/ # ping -c 1 front
PING front (10.89.4.41): 56 data bytes
64 bytes from 10.89.4.41: seq=0 ttl=42 time=0.174 ms

--- front ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.174/0.174/0.174 ms
/ # ping -c 1 front.dns.podman
PING front.dns.podman (10.89.4.41): 56 data bytes
64 bytes from 10.89.4.41: seq=0 ttl=42 time=0.040 ms

--- front.dns.podman ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.040/0.040/0.040 ms
/ # cat /etc/resolv.conf 
search dns.podman
nameserver 10.89.4.1
nameserver 8.8.8.8
nameserver 8.8.4.4
/ # cat /etc/hosts 
127.0.0.1   localhost
10.254.138.215  prod-test.bp.ginop.scom.local prod-test
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
10.254.138.215  host.containers.internal
10.89.4.42  dbabd47d4772 test_back_1
/ # nslookup front
Server:     10.89.4.1
Address:    10.89.4.1#53

Name:   front.dns.podman
Address: 10.89.4.41

Everything looks fine for me... but...

$ cat docker-compose.yaml 
services:
  front:
    image: nginx:stable-alpine
    dns_search: dns.podman
    environment:
      DNS_SUFFIX: .dns.podman
      GATEWAY: 10.89.4.1
    volumes:
      - ./templates:/etc/nginx/templates
      - ./conf.d.front:/etc/nginx/conf.d
    ports:
      - 8081:80

  back:
    image: nginx:stable-alpine
    dns_search: dns.podman
    environment:
      DNS_SUFFIX: .dns.podman
      GATEWAY: 10.89.4.1
    volumes:
      - ./templates:/etc/nginx/templates
      - ./conf.d.back:/etc/nginx/conf.d
    ports:
      - 8082:80

networks:
  default:
    name: test
    external: true

$ cat templates/default.conf.template 
server {
    resolver ${GATEWAY}

    set $front http://front${DNS_SUFFIX};
    set $back http://back${DNS_SUFFIX};

    location ~ ^/front/ {
        proxy_pass $front;
    }

    location ~ ^/back/ {
        proxy_pass $back;
    }
}

$ cat conf.d.front/default.conf 
server {
    resolver 10.89.4.1

    set $front http://front.dns.podman;
    set $back http://back.dns.podman;

    location ~ ^/front/ {
        proxy_pass $front;
    }

    location ~ ^/back/ {
        proxy_pass $back;
    }
}

...

sagemcom@prod-test:~/test$ docker-compose up 
Recreating test_back_1  ... done
Recreating test_front_1 ... done
Attaching to test_back_1, test_front_1
front_1  | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
front_1  | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
back_1   | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
back_1   | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
front_1  | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
back_1   | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
back_1   | 10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf is not a file or does not exist
front_1  | 10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf is not a file or does not exist
back_1   | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
front_1  | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
front_1  | 20-envsubst-on-templates.sh: Running envsubst on /etc/nginx/templates/default.conf.template to /etc/nginx/conf.d/default.conf
front_1  | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
back_1   | 20-envsubst-on-templates.sh: Running envsubst on /etc/nginx/templates/default.conf.template to /etc/nginx/conf.d/default.conf
back_1   | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
back_1   | /docker-entrypoint.sh: Configuration complete; ready for start up
front_1  | /docker-entrypoint.sh: Configuration complete; ready for start up
back_1   | 2022/05/25 09:34:13 [emerg] 1#1: host not found in resolver "set" in /etc/nginx/conf.d/default.conf:4
back_1   | nginx: [emerg] host not found in resolver "set" in /etc/nginx/conf.d/default.conf:4
test_back_1 exited with code 1
front_1  | 2022/05/25 09:34:13 [emerg] 1#1: host not found in resolver "set" in /etc/nginx/conf.d/default.conf:4
front_1  | nginx: [emerg] host not found in resolver "set" in /etc/nginx/conf.d/default.conf:4
test_front_1 exited with code 1

I have no idea what's the difference between my tests and nginx method to use DNS resolver. Maybe it's not a podman issue, but it works with docker. Teh anv var DNS_SUFFIX is empty for docker, also GATEWAY is 127.11

Luap99 commented 2 years ago

I have no idea about the nginx options. If normal dns works then this is not an issue with podman.

I see you use cni you may want to test with netavark/aardvark-dns instead which should have better dns support.

ikreymer commented 2 years ago

I think I have a similar issue where the nginx lookup occasionally fails with podman 4.1.0 and netavark, but not always:

ex:

10.89.0.80 - - [14/Jun/2022:02:35:06 +0000] "GET /api/settings HTTP/1.1" 200 53 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36" "-"
10.89.0.80 - - [14/Jun/2022:02:35:07 +0000] "GET /api/settings HTTP/1.1" 200 53 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36" "-"
2022/06/14 02:35:11 [error] 27#27: *60 connect() failed (113: No route to host) while connecting to upstream, client: 10.89.0.80, server: _, request: "GET /api/settings HTTP/1.1", upstream: "http://10.89.0.15:8000/api/settings", host: "AA.BB.CC.DD:PORT"
2022/06/14 02:35:11 [warn] 27#27: *60 upstream server temporarily disabled while connecting to upstream, client: 10.89.0.80, server: _, request: "GET /api/settings HTTP/1.1", upstream: "http://10.89.0.15:8000/api/settings", host: "AA.BB.CC.DD:PORT"
2022/06/14 02:35:14 [error] 27#27: *60 connect() failed (113: No route to host) while connecting to upstream, client: 10.89.0.80, server: _, request: "GET /api/settings HTTP/1.1", upstream: "http://10.89.0.15:8000/api/settings", host: "AA.BB.CC.DD:PORT"
2022/06/14 02:35:14 [warn] 27#27: *60 upstream server temporarily disabled while connecting to upstream, client: 10.89.0.80, server: _, request: "GET /api/settings HTTP/1.1", upstream: "http://10.89.0.15:8000/api/settings", host: "AA.BB.CC.DD:PORT"
10.89.0.80 - - [14/Jun/2022:02:35:14 +0000] "GET /api/settings HTTP/1.1" 200 53 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36" "-"
10.89.0.80 - - [14/Jun/2022:02:35:23 +0000] "GET /api/settings HTTP/1.1" 200 53 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36" "-"
10.89.0.80 - - [14/Jun/2022:02:35:24 +0000] "GET /api/settings HTTP/1.1" 200 53 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36" "-"

nginx is configured to use the resolver settings from /etc/resolv.conf and resolver looks like this:

resolver 10.89.0.1 valid=30s ipv6=off;

Edit: result from dig, with two ips:


; <<>> DiG 9.16.27-Debian <<>> backend
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24774
;; flags: qr rd ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 4a3030cc69feafb5 (echoed)
;; QUESTION SECTION:
;backend.           IN  A

;; ANSWER SECTION:
backend.        86400   IN  A   10.89.0.15
backend.        86400   IN  A   10.89.0.79

;; Query time: 0 msec
;; SERVER: 10.89.0.1#53(10.89.0.1)
;; WHEN: Tue Jun 14 02:43:29 UTC 2022
;; MSG SIZE  rcvd: 80

I think I ended up in a situation where an old dns record was not removed up, causing nginx resolver to fail occasionally, in case anyone else ends up in a similar situation.

Will open an issue if I can repro this consistently.

sirnuke commented 1 year ago

Definitely seeing this. I'm working with a pretty basic Nginx reverse proxy + standalone apps on CentOS Stream 9. Structurally similar to a few Docker compose apps and a FreeBSD jail setup, and I've never seen anything even vaguely like this. podman network reload sometimes helps some hostnames resolve properly, but it's doing weird stuff like resolving internal container names to my external IP address (!).

This is on top of all sorts of other weird problems like port forwarding breaking if you attach a container to two networks, which makes me think podman networks are just broken. When I have some time I'm going to reproduce this issue in a VM and reopen this.

JustinTArthur commented 1 year ago

If you encounter this issue while using aardvark-dns, it can be caused by containers/aardvark-dns#203, fixed in aardvark-dns 1.2.0.