BerlinVagrant / vagrant-dns

A plugin to manage DNS records for vagrant environments
MIT License
490 stars 50 forks source link

Vagrant VMs are destroyed during `vagrant up` #60

Closed amikheychik closed 3 years ago

amikheychik commented 6 years ago

Vagrant VM is destroyed right after it's being started with vagrant up. It took me awhile to figure out what exactly is causing it, and reproduce steps to confirm, that this issue occurs when vagrant-dns is installed.

Software versions:

Following commands were run with the freshly installed (after using uninstall.tool and running rm -rf ~/vagrant.d) Vagrant.

Steps to reproduce (lines started with are the input):

  1. First run is to demonstrate, that basic installation, without vagrant-dns is working fine.
➜ vagrant -v
Vagrant 2.0.3
➜ vagrant plugin list
No plugins installed.
➜ vagrant plugin install vagrant-vbguest
Installing the 'vagrant-vbguest' plugin. This can take a few minutes...
Fetching: micromachine-2.0.0.gem (100%)
Fetching: vagrant-vbguest-0.15.1.gem (100%)
Installed the plugin 'vagrant-vbguest (0.15.1)'!
➜ vagrant plugin list
vagrant-vbguest (0.15.1)
➜ vagrant init ubuntu/xenial64
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.
➜ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
...
➜ vagrant ssh
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-119-generic x86_64)
...
vagrant@ubuntu-xenial:~$ exit
➜ vagrant destroy -f
==> default: Forcing shutdown of VM...
==> default: Destroying VM and associated drives...
  1. Installing vagrant-dns, and running vagrant up again. This time machine will be removed in the process.
➜ vagrant plugin install vagrant-dns 
Installing the 'vagrant-dns' plugin. This can take a few minutes...
Fetching: daemons-1.2.6.gem (100%)
Fetching: nio4r-2.3.0.gem (100%)
Building native extensions.  This could take a while...
Fetching: hitimes-1.2.6.gem (100%)
Building native extensions.  This could take a while...
Fetching: timers-4.1.2.gem (100%)
Fetching: async-1.5.0.gem (100%)
Fetching: async-io-1.7.0.gem (100%)
Fetching: async-dns-1.1.0.gem (100%)
Fetching: rubydns-2.0.1.gem (100%)
Fetching: vagrant-dns-2.1.0.gem (100%)
Installed the plugin 'vagrant-dns (2.1.0)'!
➜ vagrant up                                               
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'ubuntu/xenial64'...
...
==> default: Checking for guest additions in VM...
==> default: [vagrant-dns] TLD but no host_name given. No patterns will be configured.
vagrant-dns: process with pid 58848 started.
==> default: Restarted DNS Service
==> default: Mounting shared folders...
    default: /vagrant => /Users/amikheychik/Vagrant/vagrant
➜ vagrant ssh
VM must be created before running this command. Run `vagrant up` first.
  1. Removing vagrant-dns to install previous version
➜ vagrant plugin uninstall vagrant-dns
Uninstalling the 'vagrant-dns' plugin...
Successfully uninstalled async-1.5.0
Successfully uninstalled vagrant-dns-2.1.0
Successfully uninstalled hitimes-1.2.6
Successfully uninstalled daemons-1.2.6
Successfully uninstalled timers-4.1.2
Successfully uninstalled async-io-1.7.0
Successfully uninstalled async-dns-1.1.0
Successfully uninstalled nio4r-2.3.0
Removing rubydns-check
Successfully uninstalled rubydns-2.0.1
➜ vagrant plugin install vagrant-dns --plugin-version 1.1.0
Installing the 'vagrant-dns --version '1.1.0'' plugin. This can take a few minutes...
Fetching: daemons-1.2.6.gem (100%)
Fetching: hitimes-1.2.6.gem (100%)
Building native extensions.  This could take a while...
Fetching: timers-4.0.4.gem (100%)
Fetching: celluloid-0.16.0.gem (100%)
Fetching: nio4r-2.3.0.gem (100%)
Building native extensions.  This could take a while...
Fetching: celluloid-io-0.16.2.gem (100%)
Fetching: rubydns-1.0.3.gem (100%)
Fetching: vagrant-dns-1.1.0.gem (100%)
Installed the plugin 'vagrant-dns (1.1.0)'!
➜ vagrant up                                               
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'ubuntu/xenial64'...
...
==> default: Checking for guest additions in VM...
==> default: [vagrant-dns] TLD but no host_name given. No patterns will be configured.
pid-file for killed process 75986 found (/Users/amikheychik/.vagrant.d/tmp/dns/daemon/vagrant-dns.pid), deleting.
vagrant-dns: process with pid 78619 started.
==> default: Restarted DNS Service
==> default: Mounting shared folders...
    default: /vagrant => /Users/amikheychik/Vagrant/vagrant
  1. The most bizarre part: if plugin was not installed during vagrant up and after machine is running, it was installed, machine can be stopped and started again without a problem.

  2. If plugin is installed, but stopped by vagrant dns --stop the problems remains. Only if it's removed vagrant up works fine.

Finally, good news is that once the steps are taken in the right order:

vagrant up && \
vagrant plugin install vagrant-dns && \
vagrant reload && \
vagrant dns --install

everything works fine.

mpdude commented 6 years ago

Have you seen #59 - is that related?

amikheychik commented 6 years ago

@mpdude it might be related, but the solution described in #59 (to install pre-2.0 version) doesn't help (I basically tried that because of that issue)

Some additional info I forgot to mention: I couldn't reproduce this issue on Vagrant 2.0.1 that I had for a long time, even with the latest vagrant-dns version. When I upgraded to Vagrant 2.0.2 and Vagrant 2.0.3, I had this issue, but once downgraded back — the issue was gone. Yet, once I fully cleaned-up vagrant (removed ~/.vagrant.d and used uninstall.tool), the issue appeared on 2.0.1 too.

I'm barely know Ruby, but if I had to guess, I'd suspect there is a problem with an underlying dependency. Some code was stuck from existing installation, letting it work, until I cleaned up the directory, and everything got re-installed.

ioquatix commented 6 years ago

If this is something wrong with vagrant-dns let me know if I can help.

ioquatix commented 6 years ago

We just released the updated async-dns to fix the compatibility issues in #59

fnordfish commented 6 years ago

Vagrant has that (weird?) feature that it destroys a box if there goes anything wrong during the initial "up from not created". Since vagrant-dns by default tries to re/start itself when a box is started, the dependency problem described in #59 could lead to such an error. The "fix" described in #59 is to re-install the vagrant-dns plugin (vagrant plugin uninstall vagrant-dns && vagrant plugin install vagrant-dns)

amikheychik commented 6 years ago

@fnordfish reinstallation itself doesn't help. Only upping the server without the plugin, then installing plugin on top and reloading the machine.

fnordfish commented 6 years ago

Just to be sure, have you tried to re-install after my comment? (the fix in a dependency vagrant-dns uses was released just a few hours after you opened this issue)

If it still doesn't work, can you please run a debug session (vagrant up --debug &> vagrant.log https://www.vagrantup.com/docs/other/debugging.html) and upload the log as a gist. (the log might include some sensitive information like auth tokens stored in environment variables, so make sure to remove those)

amikheychik commented 6 years ago

@fnordfish just double checked, uninstalling and installing plugin back doesn't help. (vagrant up ends up with a removed VM).

Uninstalling the 'vagrant-dns' plugin...
Successfully uninstalled async-1.5.0
Successfully uninstalled vagrant-dns-2.1.0
Successfully uninstalled hitimes-1.2.6
Successfully uninstalled daemons-1.2.6
Successfully uninstalled timers-4.1.2
Successfully uninstalled async-io-1.7.0
Successfully uninstalled async-dns-1.1.0
Successfully uninstalled nio4r-2.3.0
Removing rubydns-check
Successfully uninstalled rubydns-2.0.1
Installing the 'vagrant-dns' plugin. This can take a few minutes...
Fetching: daemons-1.2.6.gem (100%)
Fetching: nio4r-2.3.0.gem (100%)
Building native extensions.  This could take a while...
Fetching: hitimes-1.2.6.gem (100%)
Building native extensions.  This could take a while...
Fetching: timers-4.1.2.gem (100%)
Fetching: async-1.6.0.gem (100%)
Fetching: async-io-1.7.0.gem (100%)
Fetching: async-dns-1.1.0.gem (100%)
Fetching: rubydns-2.0.1.gem (100%)
Fetching: vagrant-dns-2.1.0.gem (100%)
Installed the plugin 'vagrant-dns (2.1.0)'!

Here's the log for vagrant up --debug: https://gist.github.com/amikheychik/a806ee0f40b69deb18d2fe06c334efeb

amikheychik commented 6 years ago

Note: currently vagrant-dns plugin stopped working. Still creates files in /etc/resolver, but doesn't resolve any DNS (checked on two machines)

Installing the 'vagrant-dns' plugin. This can take a few minutes...
Fetching: hitimes-1.2.6.gem (100%)
Building native extensions.  This could take a while...
Fetching: timers-4.1.2.gem (100%)
Fetching: nio4r-2.3.1.gem (100%)
Building native extensions.  This could take a while...
Fetching: async-1.8.0.gem (100%)
Fetching: async-io-1.12.0.gem (100%)
Fetching: async-dns-1.1.1.gem (100%)
Fetching: rubydns-2.0.1.gem (100%)
Fetching: daemons-1.2.6.gem (100%)
Fetching: vagrant-dns-2.1.0.gem (100%)
Installed the plugin 'vagrant-dns (2.1.0)'!

This is the current dependencies list

ioquatix commented 6 years ago

Odd, all specs are passing on my end... at least last time I checked.

amikheychik commented 6 years ago

@ioquatix I tried to wipe out Vagrant completely, originally thought it's related to v2.0.4 upgrade. But it's not working on v2.0.3 either.

ioquatix commented 6 years ago

I just confirmed all specs passing for async-dns. So there must be some other issue. The tests are fairly comprehensive. The recent changes to async-io and nio4r were minimal.

ioquatix commented 6 years ago

Can you tell me what version of Ruby you are using?

ioquatix commented 6 years ago

Can you give me log file which includes the information about daemons starting up?

Feel free to add RUBYOPT=-d it will generate a lot of output from async-dns daemon.

amikheychik commented 6 years ago

@ioquatix any idea what I can check/test to narrow it down? E.g. scutil --dns returns

resolver #8
  domain   : xd
  nameserver[0] : 127.0.0.1
  port     : 5300

as expected, but dscacheutil -q host -a name test.xd fails: returns nothing

ioquatix commented 6 years ago

dig @127.0.0.1 -p 5300 test.xd ?

ioquatix commented 6 years ago

It's running on port 5300 BTW.

amikheychik commented 6 years ago

@ioquatix

dig @127.0.0.1 -p 5300 test.xd

; <<>> DiG 9.10.6 <<>> @127.0.0.1 -p 5300 test.xd
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 62261
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;test.xd.           IN  A

;; AUTHORITY SECTION:
.           86395   IN  SOA a.root-servers.net. nstld.verisign-grs.com. 2018050300 1800 900 604800 86400

;; Query time: 22 msec
;; SERVER: 127.0.0.1#5300(127.0.0.1)
;; WHEN: Thu May 03 10:08:47 EDT 2018
;; MSG SIZE  rcvd: 100
ioquatix commented 6 years ago

@amikheychik I don't know enough about why it's running on port 5300, but it does appear to be working.

It return ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 62261 NXDOMAIN which is success, but non-existent domain (otherwise it would be fail to connect or servfail.

So, either the names you are using aren't right, or DNS is not hooked up in your Mac system correctly.

First try to get dig to return an actual result.

Once you know it's working and returning your local IP, then try to figure out why mac resolver isn't using it.

amikheychik commented 6 years ago

@ioquatix Ok, I'll try to figure it out. Thank you for help!

fnordfish commented 6 years ago

I've just tried with VirtualBox v5.2.8 and Vagrant v2.0.3 (ruby 2.4.3) but macOS 10.12.6 Works for me :( I'll try to spin up a macOS 10.13 later and try to reproduce again.

@amikheychik Is there anything special in your Vagrantfile? (could you share it?)

@ioquatix , @amikheychik We are running on port 5300 b/c we don't want to run on an privileged port which would need you to "sudo" all the time when (re)starting the service. Also, it might conflict with other legit dns use cases (local proxy etc.)

ioquatix commented 6 years ago

That all makes sense. It's fine to run on port 5300.

amikheychik commented 6 years ago

@fnordfish I have 10.13.4. Actually, that's another suspect: I've already installed their recent update from April 27, so it might be related, but I need to confirm it's also installed on my laptop.

Either way, here is my Vagrant file: https://gist.github.com/amikheychik/de150cbbbda2d5310749ab2a119b678c

It reads configuration file: https://gist.github.com/amikheychik/37ab3a0b7c7b572e21f1cbf997e43f0f

fnordfish commented 6 years ago

So, I've tried your Vagrantfile on a VM running a fresh install of macOS 10.13.4 and vagrant 2.1.1 (yah that's very fresh - will retry with an older version). What I "needed" to do was:

I did notice, that the way you are defining DNS patters wont work. You'll end up using Strings that look like Regexp. A quick fix would be to exclude the slashes from your yaml config like:

    dns:
      tld: 'xd'
      patterns:
        - '^[\w-]+.xd$'

And compile them into proper Regexp in your Vagrantfile:

machine.dns.patterns = config['patterns'].map { |e| Regexp.new(e) }

Another - not the safest - way would be to use some kind of special YAML feature and define the Regexp in yaml like this:

    dns:
      tld: 'xd'
      patterns:
        - !ruby/regexp /^[\w-]+.xd$/

But, it works:

Roberts-Mac:vdns_60 fnordfish$ vagrant destroy -f
==> xdruple: Forcing shutdown of VM...
==> xdruple: Destroying VM and associated drives...
==> xdruple: [vagrant-dns] Removing pattern: /^[\w-]+.xd$/ for ip: 192.168.33.10
vagrant-dns: trying to stop process with pid 2272...
vagrant-dns: process with pid 2272 successfully stopped.
vagrant-dns: process with pid 2308 started.
==> xdruple: Restarted DNS Service
Roberts-Mac:vdns_60 fnordfish$ vagrant dns --install

Roberts-Mac:vdns_60 fnordfish$ vagrant dns --stop
vagrant-dns: trying to stop process with pid 2308...
vagrant-dns: process with pid 2308 successfully stopped.
Roberts-Mac:vdns_60 fnordfish$ vagrant up
Bringing machine 'xdruple' up with 'virtualbox' provider...
==> xdruple: Importing base box 'ubuntu/xenial64'...
==> xdruple: Matching MAC address for NAT networking...
==> xdruple: Checking if box 'ubuntu/xenial64' is up to date...
==> xdruple: Setting the name of the VM: xdruple
==> xdruple: Clearing any previously set network interfaces...
==> xdruple: Preparing network interfaces based on configuration...
    xdruple: Adapter 1: nat
    xdruple: Adapter 2: hostonly
==> xdruple: Forwarding ports...
    xdruple: 22 (guest) => 2222 (host) (adapter 1)
==> xdruple: Running 'pre-boot' VM customizations...
==> xdruple: Booting VM...
==> xdruple: Waiting for machine to boot. This may take a few minutes...
    xdruple: SSH address: 127.0.0.1:2222
    xdruple: SSH username: vagrant
    xdruple: SSH auth method: private key
    xdruple: Warning: Connection reset. Retrying...
    xdruple: 
    xdruple: Vagrant insecure key detected. Vagrant will automatically replace
    xdruple: this with a newly generated keypair for better security.
    xdruple: 
    xdruple: Inserting generated public key within guest...
    xdruple: Removing insecure key from the guest if it's present...
    xdruple: Key inserted! Disconnecting and reconnecting using new SSH key...
==> xdruple: Machine booted and ready!
==> xdruple: Checking for guest additions in VM...
    xdruple: The guest additions on this VM do not match the installed version of
    xdruple: VirtualBox! In most cases this is fine, but in rare cases it can
    xdruple: prevent things such as shared folders from working properly. If you see
    xdruple: shared folder errors, please make sure the guest additions within the
    xdruple: virtual machine match the version of VirtualBox you have installed on
    xdruple: your host and reload your VM.
    xdruple: 
    xdruple: Guest Additions Version: 5.1.34
    xdruple: VirtualBox Version: 5.2
vagrant-dns: process with pid 2462 started.
==> xdruple: Restarted DNS Service
==> xdruple: Setting hostname...
==> xdruple: Configuring and enabling network interfaces...
==> xdruple: Mounting shared folders...
    xdruple: /vagrant => /Users/fnordfish/vdns_60
Roberts-Mac:vdns_60 fnordfish$ vagrant status
Current machine states:

xdruple                   running (virtualbox)

The VM is running. To stop this VM, you can run `vagrant halt` to
shut it down forcefully, or you can run `vagrant suspend` to simply
suspend the virtual machine. In either case, to restart it again,
simply run `vagrant up`.
amikheychik commented 6 years ago

@fnordfish your fix with machine.dns.patterns = config['patterns'].map { |e| Regexp.new(e) } indeed works. Thank you!

The only bizarre part is that I've been using config.yaml form since February, so it's seems to be a recent problem. (Plus the form with slashes / is taken from the README.md)

Also, the original problem with destroyed VM seems to remain, but I need to re-test it on a clean setup with Vagrant 2.1.1

fnordfish commented 6 years ago

Well, the Readme shows a Vagrantfile, which is ruby like this:

config.dns.patterns = [/^.*mysite.dev$/, /^.*myothersite.dev$/]

Whereas your yaml defines Strings which will basically expand to something like this:

config.dns.patterns = ['/^.*mysite.dev$/', '/^.*myothersite.dev$/']

Anyhow, I‘ve just checked again b/c I remembered some changes, but the part where we write and read the config file was pretty much never changed :) But you can use simple Strings if you skip the slashes.

Back to the original problem:
Can you try (on that failing setup) to comment out the sshfs shares and provisioning to see if it’s a problem with either one.
I do remember getting my boxes destroyed on the first „up“ when the provisioning failed.

--
Robert Schulze

From: Andrey Mikheychik notifications@github.com(mailto:notifications@github.com) Reply: BerlinVagrant/vagrant-dns reply@reply.github.com(mailto:reply@reply.github.com) Date: 8. May 2018 at 23:40:43 To: BerlinVagrant/vagrant-dns vagrant-dns@noreply.github.com(mailto:vagrant-dns@noreply.github.com) CC: Robert Schulze robert@dotless.de(mailto:robert@dotless.de), Mention mention@noreply.github.com(mailto:mention@noreply.github.com) Subject: Re: [BerlinVagrant/vagrant-dns] Vagrant VMs are destroyed during vagrant up (#60)

@fnordfish(https://github.com/fnordfish) your fix with machine.dns.patterns = config['patterns'].map { |e| Regexp.new(e) } indeed works. Thank you!

The only bizarre part is that I've been using config.yaml form since February, so it's seems to be a recent problem. (Plus the form with slashes / is taken from the README.md)

Also, the original problem with destroyed VM seems to remain, but I need to re-test it on a clean setup with Vagrant 2.1.1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub(https://github.com/BerlinVagrant/vagrant-dns/issues/60#issuecomment-387551259), or mute the thread(https://github.com/notifications/unsubscribe-auth/AAAyJy-FgU2wBJ9AIrpgg_BzKn0iffP9ks5twhDbgaJpZM4TQapE).

amikheychik commented 6 years ago

@fnordfish just wiped out Vagrant and VirtualBox, re-install the latest versions (2.1.1 and 5.2.12). Installed only vagrant-vbguest and vagrant-dns plugins. Run vagrant init ubuntu/xenial64 and vagrant up (so it runs on a standard Vagrantfile, no provisioning or DNS setup). vagrant up ends with:

Unmounting Virtualbox Guest Additions ISO from: /mnt
==> default: Checking for guest additions in VM...
==> default: [vagrant-dns] TLD but no host_name given. No patterns will be configured.
vagrant-dns: process with pid 3804 started.
==> default: Restarted DNS Service
==> default: Mounting shared folders...
    default: /vagrant => /Users/amikheychik/Vagrant/test

And after that vagrant ssh fails: VM must be created before running this command. Runvagrant upfirst.

amikheychik commented 6 years ago

This is odd: I did the same on my second Mac, and this time it works fine. What was more odd, last night my regular machine with that full Vagrantfile and provisioning worked fine too, but I didn't double check.

So for now I guess we can consider this issue if not resolved, but definitely under strong suspicion that it's resolved.

fnordfish commented 6 years ago

You won't believe it. I found a way to reproduce a destroy while up. ... not yet sure what it caused. Currently, the only way to reproduce is with config.vm.network :private_network, type: :dhcp

fnordfish commented 6 years ago

So, what happens is that Daemons decides to exit (here: https://github.com/thuehlinger/daemons/blob/c024cf01571fb9d21f5359656fe0cfc87ddca91f/lib/daemons/daemonize.rb#L67) This causes a SystemExit exception to bubble up, finally get caught by vagrant which swallows it but destroys the box.

While I still have no idea what's going on, I'm going to catch errors while starting up the actual dns server and print an error message (so that at least we know what happened and vagrant won't destroy the box).

xyr115 commented 6 years ago

Anymore updates on this one?

fnordfish commented 6 years ago

hi. sorry, nothing much. What I thought would be a viable solution wasn't. This SystemExit exception mentioned earlier seems to be important, catching it will leave Vagrant in a weird state and nothing really works.

Again, I'm kind of puzzled what's going on here.