BerlinVagrant / vagrant-dns

A plugin to manage DNS records for vagrant environments
MIT License
490 stars 50 forks source link

Answer NOTIMPL on AAAA instead of dropping #76

Closed mattiasb closed 1 year ago

mattiasb commented 1 year ago

TL;DR

Saying that we don't support AAAA queries by returning NOTIMPL on such queries will make interoperability with other software (notably systemd-resolved) work better. :)

The longer story

When looking for AAAA records from the DNS server in vagrant-dns I get this:

$ dig AAAA mgmt.test @127.0.0.153 -p 5300
;; communications error to 127.0.0.153#5300: timed out
;; communications error to 127.0.0.153#5300: timed out
;; communications error to 127.0.0.153#5300: timed out

; <<>> DiG 9.18.13 <<>> AAAA mgmt.test @127.0.0.153 -p 5300
;; global options: +cmd
;; no servers could be reached

NOTE: I've set VagrantDNS::Config.listen = [[:udp, '127.0.0.153', 5300]] in my Vagrantfile.

Since systemd-resolved (and hence also nss-resolve¹) queries both A and AAAA when resolving a name each name resolution to the vagrant-dns server ends in a timeout on Linux. It usually takes around 10s to resolve a domain name from vagrant-dns for me. I've been ignoring this for a while since I've been so happy to just have something working (and we had the same issues with landrush as well).

Example session with resolvectl query and ping:

$ resolvectl query mgmt.test
mgmt.test: 192.168.122.46

-- Information acquired via protocol DNS in 10.0308s.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network

$ time ping -c 1 mgmt.test
PING mgmt.test (192.168.122.46) 56(84) bytes of data.
64 bytes from 192.168.122.46 (192.168.122.46): icmp_seq=1 ttl=64 time=0.302 ms

--- mgmt.test ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.302/0.302/0.302/0.000 ms

real    0m10,106s
user    0m0,000s
sys 0m0,002s

From reading issue #22575 of systemd and specifically this comment it seems that a nicer way to handle not supporting AAAA records would be to return a NOTIMPL return code. That would in turn make the name resolution take milliseconds and just return the A answer.

My guess is that one might need to do work upstream in rubydns to get this working.

NOTE: I'm not suggesting adding support for AAAA (and hence IPv6) btw. :) Just to respond a little bit nicer! :)

1: nss-resolve is the resolved backend for nss that is in turn used by glibc for getaddrinfo etc.

mattiasb commented 1 year ago

I'll try to make some time to look into this..

fnordfish commented 1 year ago

should be as simple as:

diff --git a/lib/vagrant-dns/service.rb b/lib/vagrant-dns/service.rb
index 47f5b82..b721d2c 100644
--- a/lib/vagrant-dns/service.rb
+++ b/lib/vagrant-dns/service.rb
@@ -35,6 +35,9 @@ module VagrantDNS
           match(pattern, Resolv::DNS::Resource::IN::A) do |transaction, match_data|
             transaction.respond!(ip, ttl: ttl)
           end
+          match(proc { |name, resource_class| resource_class != Resolv::DNS::Resource::IN::A }) do |transaction, match_data|
+            transaction.fail!(:NotImp)
+          end
         end

         otherwise do |transaction|

EDIT: No, that's not quite it, we still need to match the pattern

mattiasb commented 1 year ago

So I got to looking at this immediately.

One thing I noticed is that there's a risk for a DNS loop on Linux here.

On Ubuntu and Fedora /etc/resolv.conf looks something like this:

nameserver 127.0.0.53
options edns0 trust-ad
search <DOMAIN>.<TLD>

127.0.0.53 in turn is then systemd-resolved. I believe (from looking at the code of Async DNS) that Async::DNS::System.nameservers ends up being 127.0.0.53 on Ubuntu and Fedora.

Given this code:

        otherwise do |transaction|
          transaction.passthrough!(std_resolver) do |reply, reply_name|
            puts reply
            puts reply_name
          end
        end

... if a query fails (for example for AAAA to mgmt.test) then vagrant-dns will forward the query to systemd-resolved which will forward it back to vagrant-dns.

My thinking is that we don't need to forward any requests at all. This will work on Linux since the other DNS servers will be bound to the respective interfaces they are on. Like this:

$ resolvectl 
Global
         Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: stub
Current DNS Server: 127.0.0.153:5300  ← vagrant-dns
        DNS Servers 127.0.0.153:5300  ← vagrant-dns
         DNS Domain ~test

Link 2 (eno1)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 3 (enp11s0u1u2)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 172.31.32.101                   ← Link specific DNS server
       DNS Servers: 172.31.32.100 172.31.32.101     ← Link specific DNS server
        DNS Domain: example.com

Link 4 (wlp61s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 172.31.32.101                   ← Link specific DNS server
       DNS Servers: 172.31.32.100 172.31.32.101     ← Link specific DNS server
        DNS Domain: example.com

Link 6 (virbr0)
Current Scopes: LLMNR/IPv4
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 7 (tap0)
Current Scopes: LLMNR/IPv6
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

And I believe that on MacOS, since you just map a TLD to a specific DNS server, you should be fine there as well. Well, unless you're using a tld like .com and expect to be able to reach my-local-machine.com via vagrant-dns and then all other .com addresses via passthrough. Is that a setup you want to support?

The reason I'm asking is that this little diff solves the issue with the 10s timeout for me:

diff --git a/lib/vagrant-dns/service.rb b/lib/vagrant-dns/service.rb
index 47f5b82..26c9faf 100644
--- a/lib/vagrant-dns/service.rb
+++ b/lib/vagrant-dns/service.rb
@@ -27,7 +27,6 @@ module VagrantDNS
       end

       registry = Registry.new(tmp_path).to_hash
-      std_resolver = RubyDNS::Resolver.new(Async::DNS::System.nameservers)
       ttl = VagrantDNS::Config.ttl

       RubyDNS::run_server(VagrantDNS::Config.listen) do
@@ -37,11 +36,12 @@ module VagrantDNS
           end
         end

+        match(//, Resolv::DNS::Resource::IN::A) do |transaction|
+          transaction.fail!(:NXDomain)
+        end
+
         otherwise do |transaction|
-          transaction.passthrough!(std_resolver) do |reply, reply_name|
-            puts reply
-            puts reply_name
-          end
+          transaction.fail!(:NotImp)
         end
       end
     end
mattiasb commented 1 year ago

To add to my already long comment:

My initial thought was that systemd-resolved would continuously try to ask for a AAAA record from vagrant-dns and eventually timing out because we didn't send a response.

I now believe that the issue actually was a DNS loop. It makes sense since if the passthrough actually forwarded to the system DNS server instead of back to systemd-resolved they would get an NXDOMAIN from there instead.

fnordfish commented 1 year ago

vagrant-dns allows to hook into public TLDs. And while I have no clue if that is still in use, I wouldn't feel comfortable to remove that feature[^1].

So here's my proposal:

  1. we make resolver configurable:
    • false: Disable passthrough
    • nil, :system Use system nameservers (default)
    • [ [proto, ip, port ], ["udp", "1.1.1.1", 53] ]: list of servers to use
  2. we match all queries two times, first against A for our positive match, than again without class restriction for NOTIMP

Providing a custom upstream DNS server should be helpful in any way.
Non-A-queries for configured patterns will return NOTIMP, while you can still passthrough everything else.

[^1]: Mind that .DEV was used for quite some time, until it became a public TLD, and I've seen people still use that.

fnordfish commented 1 year ago

released in v2.4.0