Vagrant's 127.0.0.1 hostname alias overrides's hostmanager's IP

karlkfi commented 8 years ago

Recent versions of Vagrant set the VM name and hostname as aliases to 127.0.0.1 at the top of /etc/hosts on CentOS. This seems to change the behavior of hostname resolution performed by Name Server Switch such that 127.0.0.1 is returned instead of the IP added by vagrant-hostmanager.

Example:

In this dcos-vagrant example, there are several VMs, one of which is named a1 with the hostname a1.dcos.

$ cat /etc/hosts
127.0.0.1   a1.dcos a1
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

## vagrant-hostmanager-start
192.168.65.111  a1.dcos
192.168.65.50   boot.dcos
192.168.65.90   m1.dcos
## vagrant-hostmanager-end

Host resolution works fine for many tools, like host, nslookup and dig, but fails for tools that use NSS like ping and curl.

ping hits 127.0.0.1 instead of 192.168.65.111:

$ ping a1.dcos
PING a1.dcos (127.0.0.1) 56(84) bytes of data.
64 bytes from a1.dcos (127.0.0.1): icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from a1.dcos (127.0.0.1): icmp_seq=2 ttl=64 time=0.047 ms
^C
--- a1.dcos ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1027ms
rtt min/avg/max/mdev = 0.022/0.034/0.047/0.013 ms

curl a1.dcos tries 127.0.0.1 first and only tries 192.168.65.111 if 127.0.0.1 is refused:

$ curl -v a1.dcos
* About to connect() to a1.dcos port 80 (#0)
*   Trying 127.0.0.1...
* Connection refused
*   Trying 192.168.65.111...
* Connection refused
* Failed connect to a1.dcos:80; Connection refused
* Closing connection 0
curl: (7) Failed connect to a1.dcos:80; Connection refused

Rearranging /etc/hosts to put hostmanager aliases on top fixes ping:

$ cat /etc/hosts
## vagrant-hostmanager-start
192.168.65.111  a1.dcos
192.168.65.50   boot.dcos
192.168.65.90   m1.dcos
## vagrant-hostmanager-end

127.0.0.1   a1.dcos a1
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

$ ping a1.dcos
PING a1.dcos (192.168.65.111) 56(84) bytes of data.
64 bytes from a1.dcos (192.168.65.111): icmp_seq=1 ttl=64 time=0.023 ms
64 bytes from a1.dcos (192.168.65.111): icmp_seq=2 ttl=64 time=0.036 ms
^C
--- a1.dcos ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1045ms
rtt min/avg/max/mdev = 0.023/0.029/0.036/0.008 ms

But curl still tried 127.0.0.1 first:

$ curl -v a1.dcos
* About to connect() to a1.dcos port 80 (#0)
*   Trying 127.0.0.1...
* Connection refused
*   Trying 192.168.65.111...
* Connection refused
* Failed connect to a1.dcos:80; Connection refused
* Closing connection 0
curl: (7) Failed connect to a1.dcos:80; Connection refused

karlkfi commented 8 years ago

The above report is using a CentOS 7.2 guest on VirtualBox.

karlkfi commented 8 years ago

Curl version:

$curl --version
curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.19.1 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz

karlkfi commented 8 years ago

Removing the Vagrant added alias results in the desired behavior:

$ curl -v a1.dcos
* About to connect() to a1.dcos port 80 (#0)
*   Trying 192.168.65.111...
* Connection refused
* Failed connect to a1.dcos:80; Connection refused
* Closing connection 0
curl: (7) Failed connect to a1.dcos:80; Connection refused

ggascoigne commented 8 years ago

FYi I've run into the same issue. I get around it by automating stripping out the offending entries with a shell provisioner. Something like this:


hostfix = 'sed "s/\\(127.0.0.1.*\\)$(hostname)\\(.*\\)/\\1\\2/" < /etc/hosts > /tmp/hosts && mv /tmp/hosts /etc/hosts'

...

    node.vm.provision :shell, inline: hostfix

karlkfi commented 8 years ago

Yeah, I've had to use something similar. Would be nice if vagrant-hostmanager did it for us tho.

robinbowes commented 8 years ago

I've lost yesterday because of this issue - I couldn't figure out why rancher/docker weren't working properly!

IMO, the correct thing to do here is for vagrant to not add that line to /etc/hosts. I wonder if it can be disabled?

Jaraxal commented 8 years ago

I've been running into the same issue. I'm using Vagrant 1.8.6 and CentOS 7.2 based vagrant boxes.

karlkfi commented 8 years ago

FWIW, I use a provisioning step to work around this issue:

machine_types.each do |name, machine_type|
    config.vm.define name do |machine|
      machine.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\\t#{machine.vm.hostname}\\t#{name}$/d' /etc/hosts"
    end
end

Jaraxal commented 8 years ago

Handy work around. Thank you.

GastonGonzalez commented 6 years ago

I ran into this problem and wasted nearly a day trying to figure out why a couple of my services (e.g., Spark and ZooKeeper) failed to work. In my case, I use Vagrant with Ansible as my provisioner. Here's the equivalent workaround for Ansible. Simply, add this to one of your Ansible roles:

- name: prevent hostname from binding to the loopback address
  command: sed -i '/127.0.0.1\t{{ansible_hostname}}\t{{ansible_hostname}}/d' /etc/hosts
  ignore_errors: true
  changed_when: true

devopsgroup-io / vagrant-hostmanager

Vagrant's 127.0.0.1 hostname alias overrides's hostmanager's IP #203