devopsgroup-io / vagrant-hostmanager

:pencil: A Vagrant plugin that manages hosts files within a multi-machine environment.
Mozilla Public License 2.0
1.46k stars 148 forks source link

Machines fail to come back up cleanly with hostmanager installed #121

Open rthomas opened 9 years ago

rthomas commented 9 years ago

I have the following block in my Vagrantfile to only run hostmanager if it is installed:

  if defined? VagrantPlugins::HostManager
    config.hostmanager.enabled = true
    config.hostmanager.manage_host = true
    config.hostmanager.include_offline = false
    # Custom IP resolver is used to pull the IP out of the private DHCP'd
    # interface eth1 as the hostmanager plugin does not support DHCP'd IPs
    config.hostmanager.ip_resolver = proc do |machine|
      result = ""
      machine.communicate.execute("ifconfig eth1") do |type, data|
        result << data if type == :stdout
      end
      (ip = /inet addr:(\d+\.\d+\.\d+\.\d+)/.match(result)) && ip[1]
    end
  end

When it is installed, provisioning my boxes works fine, however if I halt them, and then do a vagrant up each box fails with the message below - removing hostmanager allows them to start up cleanly.

I have 5 boxes in this config and each one fails at the hostmanager stage.

==> apt-cache: Clearing any previously set forwarded ports...
==> apt-cache: Clearing any previously set network interfaces...
==> apt-cache: Preparing network interfaces based on configuration...
    apt-cache: Adapter 1: nat
    apt-cache: Adapter 2: hostonly
==> apt-cache: Forwarding ports...
    apt-cache: 22 => 2222 (adapter 1)
==> apt-cache: Running 'pre-boot' VM customizations...
==> apt-cache: Booting VM...
==> apt-cache: Waiting for machine to boot. This may take a few minutes...
    apt-cache: SSH address: 127.0.0.1:2222
    apt-cache: SSH username: vagrant
    apt-cache: SSH auth method: private key
    apt-cache: Warning: Connection timeout. Retrying...
==> apt-cache: Machine booted and ready!
==> apt-cache: Checking for guest additions in VM...
==> apt-cache: Setting hostname...
==> apt-cache: Configuring and enabling network interfaces...
==> apt-cache: Mounting shared folders...
    apt-cache: /vagrant => /Users/ryan/src/conex.io/infra
==> apt-cache: Updating /etc/hosts file on active guest machines...
The provider for this Vagrant-managed machine is reporting that it
is not yet ready for SSH. Depending on your provider this can carry
different meanings. Make sure your machine is created and running and
try again. Additionally, check the output of `vagrant status` to verify
that the machine is in the state that you expect. If you continue to
get this error message, please view the documentation for the provider
you're using.

The box is left in an up and running state, it is just annoying to run vagrant up five times in order to bring up all of my boxes after a halt.

rthomas commented 9 years ago

I believe I have found the root cause here, which is the usage of active_machines here: https://github.com/smdahlen/vagrant-hostmanager/blob/8ec6108143a6cbf9f9ac839ad9124b92a9b9d881/lib/vagrant-hostmanager/action/update_all.rb#L31

From the Vagrant docs, active_machines is:

Returns a list of machines that this environment is currently managing that physically have been created.

An "active" machine is a machine that Vagrant manages that has been created. The machine itself may be in any state such as running, suspended, etc. but if a machine is "active" then it exists.

So this will return the set of machines that have been created, but will also include those in the poweroff state

dincho commented 9 years ago

Same problem here.

pykler commented 9 years ago

Same here

pykler commented 9 years ago

I believe the problem shows up if you have a custom ip_resolver which most people would since the normal ip_resolver just looks at the ssh_config. With a custom ip_resolver, you just have to catch the exception when trying to ssh into the machine that is down. Here is my ip_resolver for reference:

$logger = Log4r::Logger.new('vagrantfile')
def read_ip_address(machine)
  command = "LANG=en ifconfig  | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 | awk '{ print $1 }'"
  result  = ""

  $logger.info "Processing #{ machine.name } ... "

  begin
    # sudo is needed for ifconfig
    machine.communicate.sudo(command) do |type, data|
      result << data if type == :stdout
    end
    $logger.info "Processing #{ machine.name } ... success"
  rescue
    result = "# NOT-UP"
    $logger.info "Processing #{ machine.name } ... not running"
  end

  # the second inet is more accurate
  result.chomp.split("\n").last
end

Vagrant.configure("2") do |config|
    # ...
    if Vagrant.has_plugin?("HostManager")
        # ...
        config.hostmanager.ip_resolver = proc do |vm, resolving_vm|
          read_ip_address(vm)
        end
rthomas commented 9 years ago

Thanks @pykler that worked for me.

dincho commented 9 years ago

yeah wrap in begin/rescue works for me too. thanks @pykler