coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
147 stars 30 forks source link

Container Linux `hostname` binary is from net-tools and broken #1983

Open maikzumstrull opened 7 years ago

maikzumstrull commented 7 years ago

Issue Report

Bug

Container Linux Version

NAME="Container Linux by CoreOS"
VERSION=1409.1.0

Environment

GCE

Expected Behavior

After setting the kernel hostname to the short hostname, hostname -f should display the FQDN.

Actual Behavior

mzumstrull@kube-mzumstrull-master-wknr ~ $ hostname
kube-mzumstrull-master-wknr
mzumstrull@kube-mzumstrull-master-wknr ~ $ hostname -f
kube-mzumstrull-master-wknr
mzumstrull@kube-mzumstrull-master-wknr ~ $ ./hostname -f
kube-mzumstrull-master-wknr.c.kubernetes-staging.internal
mzumstrull@kube-mzumstrull-master-wknr ~ $ getent ahosts `hostname`
172.16.219.5    STREAM kube-mzumstrull-master-wknr.c.kubernetes-staging.internal
172.16.219.5    DGRAM
172.16.219.5    RAW

In this case, ./hostname is a copy of /usr/bin/hostname from a random CentOS 7 machine. CentOS (like Debian/Ubuntu and…basically everyone) ignores the ancient, broken hostname binary from the deprecated net-tools package and ships something else.

Reproduction Steps

  1. Set kernel hostname to a short hostname
  2. Make sure DNS resolution setup is such that hostname -f should work
  3. Run hostname -f
  4. Run getent ahosts $(hostname) to verify name resolution works and it's the hostname binary's fault
lucab commented 7 years ago

Thanks for the report. Would you mind sharing:

  1. how did you set the hostname
  2. what do hostnamectl and uname -n say
  3. how/where is the dnsdomainname part configured

For reference, Debian is using a custom hostname source, while another (disabled) binary is in coreutils.

maikzumstrull commented 7 years ago
  1. hostnamectl set-hostname
  2. (On a new instance, we rotate them out pretty quickly during development)
    mzumstrull@kube-mzumstrull-master-ls7t ~ $ hostnamectl
    Static hostname: kube-mzumstrull-master-ls7t
         Icon name: computer-vm
           Chassis: vm
        Machine ID: f4ccc377b98cb3ec7548a9acd940bc95
           Boot ID: a5ef00d994714509bf7a9330bc4e38f9
    Virtualization: kvm
    Operating System: Container Linux by CoreOS 1409.1.0 (Ladybug)
            Kernel: Linux 4.11.2-coreos
      Architecture: x86-64
    mzumstrull@kube-mzumstrull-master-ls7t ~ $ uname -n
    kube-mzumstrull-master-ls7t
  3. The search domain is correctly set in resolv.conf. This works fine, as confirmed by getent ahosts $(hostname). dnsdomain is not explicitly set, but derived from the FQDN in a working hostname binary.

The kernel.domainname setting is a relic from YP/NIS days and should not be set on modern systems. Accordingly, it is kernel.domainname = (none).

maikzumstrull commented 7 years ago

Yet another hostname implementation is in GNU inetutils, I am told. busybox and toybox probably also each have one. I don't know which is best. Also don't know if LSB or something like it specifies how hostname -f should derive the FQDN.

thetuxkeeper commented 7 years ago

We have the same issue. I can see that hostname -f/libnss_dns only sends AAAA DNS requests (Why?). Since we only have A records in the DNS, all of the the search domains fail. We use consul, so coreos-tst1-001.node.dc1.tst.example.com would be the correct FQDN.

core@coreos-tst1-001 ~ $ dig coreos-tst1-001.node.dc1.tst.example.com

; <<>> DiG 9.10.2-P4 <<>> coreos-tst1-001.node.dc1.tst.example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63491
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;coreos-tst1-001.node.dc1.tst.example.com. IN A

;; ANSWER SECTION:
coreos-tst1-001.node.dc1.tst.example.com. 0 IN A 10.30.8.20

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Aug 07 14:15:26 CEST 2017
;; MSG SIZE  rcvd: 86
13:49:54.913344 IP (tos 0x0, ttl 64, id 25153, offset 0, flags [DF], proto UDP (17), length 90)
    127.0.0.1.15232 > 127.0.0.1.domain: [bad udp cksum 0xfe59 -> 0x46dd!] 40136+ AAAA? coreos-tst1-001.service.dc1.tst.example.com. (62)
13:49:54.914432 IP (tos 0x0, ttl 64, id 25154, offset 0, flags [none], proto UDP (17), length 140)
    127.0.0.1.domain > 127.0.0.1.15232: [bad udp cksum 0xfe8b -> 0x6dbd!] 40136 NXDomain q: AAAA? coreos-tst1-001.service.dc1.tst.example.com. 0/1/0 ns: tst.example.com. [0s] SOA ns.tst.example.com. postmaster.tst.example.com. 1502106594 3600 600 86400 0 (112)
13:49:54.914513 IP (tos 0x0, ttl 64, id 25155, offset 0, flags [DF], proto UDP (17), length 87)
    127.0.0.1.39431 > 127.0.0.1.domain: [bad udp cksum 0xfe56 -> 0x682f!] 45232+ AAAA? coreos-tst1-001.node.dc1.tst.example.com. (59)
13:49:54.914565 IP (tos 0x0, ttl 64, id 25156, offset 0, flags [none], proto UDP (17), length 87)
    127.0.0.1.domain > 127.0.0.1.39431: [bad udp cksum 0xfe56 -> 0xe7ae!] 45232 q: AAAA? coreos-tst1-001.node.dc1.tst.example.com. 0/0/0 (59)
13:49:54.914668 IP (tos 0x0, ttl 64, id 25157, offset 0, flags [DF], proto UDP (17), length 82)
    127.0.0.1.30123 > 127.0.0.1.domain: [bad udp cksum 0xfe51 -> 0xf884!] 63417+ AAAA? coreos-tst1-001.dc1.tst.example.com. (54)
13:49:54.914999 IP (tos 0x0, ttl 64, id 25158, offset 0, flags [none], proto UDP (17), length 132)
    127.0.0.1.domain > 127.0.0.1.30123: [bad udp cksum 0xfe83 -> 0x2775!] 63417 NXDomain q: AAAA? coreos-tst1-001.dc1.tst.example.com. 0/1/0 ns: tst.example.com. [0s] SOA ns.tst.example.com. postmaster.tst.example.com. 1502106594 3600 600 86400 0 (104)
13:49:54.915054 IP (tos 0x0, ttl 64, id 25159, offset 0, flags [DF], proto UDP (17), length 78)
    127.0.0.1.23293 > 127.0.0.1.domain: [bad udp cksum 0xfe4d -> 0x4e65!] 8997+ AAAA? coreos-tst1-001.tst.example.com. (50)
13:49:54.915396 IP (tos 0x0, ttl 64, id 25160, offset 0, flags [none], proto UDP (17), length 128)
    127.0.0.1.domain > 127.0.0.1.23293: [bad udp cksum 0xfe7f -> 0x815d!] 8997 NXDomain q: AAAA? coreos-tst1-001.tst.example.com. 0/1/0 ns: tst.example.com. [0s] SOA ns.tst.example.com. postmaster.tst.example.com. 1502106594 3600 600 86400 0 (100)
13:49:54.915449 IP (tos 0x0, ttl 64, id 25161, offset 0, flags [DF], proto UDP (17), length 74)
    127.0.0.1.38623 > 127.0.0.1.domain: [bad udp cksum 0xfe49 -> 0x23ca!] 35022+ AAAA? coreos-tst1-001.example.com. (46)
13:49:54.928905 IP (tos 0x0, ttl 64, id 25173, offset 0, flags [none], proto UDP (17), length 140)
    127.0.0.1.domain > 127.0.0.1.38623: [bad udp cksum 0xfe8b -> 0x209f!] 35022 NXDomain q: AAAA? coreos-tst1-001.example.com. 0/1/0 ns: example.com. [42m53s] SOA ns1.first-ns.de. postmaster.robot.first-ns.de. 2017062903 3600 1800 604800 3600 (112)
13:49:54.929027 IP (tos 0x0, ttl 64, id 25174, offset 0, flags [DF], proto UDP (17), length 61)
    127.0.0.1.20904 > 127.0.0.1.domain: [bad udp cksum 0xfe3c -> 0x7270!] 33234+ AAAA? coreos-tst1-001. (33)
13:49:54.954772 IP (tos 0x0, ttl 64, id 25192, offset 0, flags [none], proto UDP (17), length 136)
    127.0.0.1.domain > 127.0.0.1.20904: [bad udp cksum 0xfe87 -> 0xc96c!] 33234 NXDomain q: AAAA? coreos-tst1-001. 0/1/0 ns: . [1h] SOA a.root-servers.net. nstld.verisign-grs.com. 2017080700 1800 900 604800 86400 (108)
thetuxkeeper commented 7 years ago

I did some debugging and it seems that the net-tools hostname uses gethostbyname which has a IPv4 fallback for each attempt to resolve a hostname. But it seems that the IPv6 DNS queries don't fail hard enough to trigger the IPv4 fallback => no search domain returns a valid IP => FQDN is not resolved like it should. hostname in debian is using getaddrinfo which seems to be the modern (and better) way to handle IPv4+IPv6 queries. And it seems the debian hostname could be a drop-in replacement for the net-tools hostname (all arguments supported and more, but don't know about the behavior except resolving the fqdn). But it's just my impression after trying to understand those two glibc functions and hostname implementations. I don't have any experience with it other than using hostname from time to time.