NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
16.52k stars 13k forks source link

wireguard: Peer name resolution fails with dnscrypt-proxy2 enabled #171079

Open anpandey opened 2 years ago

anpandey commented 2 years ago

Describe the bug

I'm using dnsscrypt-proxy2 listening locally on localhost port 53 as my system DNS server. I also have services.wireguard enabled in a network namespace with a domain name as a peer endpoint. However, wg fails to properly resolve the endpoint address when the generated systemd service is run.

Apr 30 14:55:22 thinkpad-x1 wireguard-wg1-peer-<snip>-refresh-start[3270]: Name or service not known: `example.tld:50000'

This is what the relevant part of my configuration.nix looks like:

  networking = {
    nameservers = [ "127.0.0.1" "::1" ];
    networkmanager = {
      enable = true;
      dns = "none";
    };
    wireguard.interfaces = {
      wg1 = {
        preSetup = "${pkgs.iproute}/bin/ip netns add sn";
        postShutdown = "${pkgs.iproute}/bin/ip netns del sn";
        ips = [ "10.100.0.2/32" ];
        interfaceNamespace = "sn";
        listenPort = 50000;
        peers = [
          {
            allowedIPs = [ "0.0.0.0/0" ];
            endpoint = "example.tld:50000";
            dynamicEndpointRefreshSeconds = 14400;
          }
        ];
      };

Additional context

I can confirm with dnscrypt-proxy2 disabled and NetworkManager DNS enabled, name resolution for the endpoint works. My guess is that the wg invocations are run in the specified network namespace (where dnscrypt-proxy2 is not reachable), so DNS resolution fails.

Also interesting is that by directly using the IP address of the endpoint (so that the connection is usable), curl is able to use the locally running dnscrypt-proxy2 instance.

$ sudo -E ip netns exec sn curl http://areallylongdomain.com/
curl: (6) Could not resolve host: areallylongdomain.com

and running tcpdump:

$ sudo tcpdump -i lo 'port 53'
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:32:30.720305 IP localhost.40461 > localhost.domain: 24251+ [1au] A? areallylongdomain.com. (50)
16:32:30.737990 IP localhost.domain > localhost.40461: 24251 NXDomain 0/1/1 (123)

My guess is that wg is doing something different for name resolution (but it looks like it uses getaddrinfo()), or that the endpoint needs to be fully set up for name resolution to work.

A fix for this might be similar to what's needed in #169128, where we can resolve the domain name for the endpoint before all other configuration (e.g. moving the wg1 interface to its own namespace)

Notify maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.17.3, NixOS, 21.11 (Porcupine)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.16`
 - channels(root): `"nixos-21.11.337193.5fb3a179605, nixos-unstable, nixpkgs"`
anpandey commented 2 years ago

After a bit of investigation, wg isn't actually doing anything different from programs like curl for DNS resolution.

This is happening because dnscrypt starts listening for requests only after it detects the network is up. If wireguard-wg1-peer-xxxx.service starts before that, my guess is wg quits entirely (probably because it's getting a connection refused) instead of retrying in cases like when the network is down.

anpandey commented 2 years ago

I found a workaround to have systemd do the retries since wg thinks it's an unrecoverable error:

systemd.services."wireguard-wg1-peer-xxxxx-refresh" = {
  serviceConfig = {
    Restart = "on-failure";
    RestartSec = 60;
  };
};