daeuniverse / dae

eBPF-based Linux high-performance transparent proxy solution.
GNU Affero General Public License v3.0
2.74k stars 167 forks source link

[Bug Report] Failed to send udp dns request #462

Open fancl20 opened 4 months ago

fancl20 commented 4 months ago

Checks

Current Behavior

dae can't send udp dns request to upstream.

Feb 21 13:03:22 cm4router dae[1019]: level=trace msg="Choose DNS path" choose="udp+4" ipversions=[4 6] l4protos=[udp] upstream="udp://dns.alidns.com:53" use="223.6.6.6:53" Feb 21 13:03:22 cm4router dae[1019]: level=trace msg="Received UDP(DNS) 192.168.1.150:10677 <-> 8.8.8.8:53: ssl.gstatic.com. A" Feb 21 13:03:22 cm4router dae[1019]: level=debug msg="Failed to write UDP(DNS) packet request." err="write udp 0.0.0.0:45443->[::ffff:223.6.6.6]:53: address ::ffff:223.6.6.6: non-IPv4 address" from="192.168.1.150:48311" mac="ca:21:89:fe:e2:8b" network="udp4(DNS)" pid=0 pname= to="223.6.6.6:53" Feb 21 13:03:22 cm4router dae[1019]: level=warning msg="handlePkt: failed to read from: 223.6.6.6:53 (dialer: direct): read udp 0.0.0.0:60382: i/o timeout" Feb 21 13:03:22 cm4router dae[1019]: level=trace msg="Request to DNS upstream" question=[{mail.google.com. 1 1}] upstream="udp://dns.alidns.com:53" Feb 21 13:03:22 cm4router dae[1019]: level=trace msg="Choose DNS path" choose="udp+4" ipversions=[4 6] l4protos=[udp] upstream="udp://dns.alidns.com:53" use="223.6.6.6:53"

Expected Behavior

Succeeded

Steps to Reproduce

dae is running inside netns and bind veth0 (netns) as lan. Packet forward path: end0 --> br0 --> veth0 --> veth0 (netns) --> (dae) --> veth0 --> br0 --> end0 Outgoing path from netns: veth0 (netns) --> veth0 --> br0 --> end0 (snat)

global{
  log_level: trace
  lan_interface: veth0
  auto_config_kernel_parameter: true
}
dns {
  upstream {
    googledns: 'tcp+udp://dns.google.com:53'
    alidns: 'udp://dns.alidns.com:53'
  }
  routing {
    request {
      fallback: alidns
    }
    response {
      upstream(googledns) -> accept

      ip(geoip:private) && !qname(geosite:cn) -> googledns
      fallback: accept
    }
  }
}
routing {
  fallback: direct
}

Environment

Anything else?

  1. No nft rule set in the netns.
  2. Doesn't see any outgoing udp packet tcpdump -nn -i veth0.
  3. TCP DNS works fine.
  4. nslookup xxx 223.6.6.6 works fine in the veth0 (netns). This isn't go through the dae as expected because wan isn't binded.
  5. No daens created and 0.4.0 has the same issue.
dae-prow[bot] commented 4 months ago

Thanks for opening this issue!

fancl20 commented 4 months ago

I have a theory of what happened here:

  1. Direct dialer call LookupNetIP for remote addr, which uses netip.AddrFromSlice.
  2. There is a known issue that a. lookupIPAddr always use 16bit form. b. netip.AddrFromSlice won't unmap 4in6 if it's in 16bit form. c. Is4() returns false for 4in6 IP.
  3. Unfortunately, UDPConn use Is4() to check whether this is a v4 IP, which will fail if it's a 4in6 IP.

My guess is if ipv6 in the netns is set properly, UDPConn will use AF_INET6 and mitigate the issue? But also we can check Is4in6() here and convert it to v4 IP. I will update my tests later.

jschwinger233 commented 4 months ago

If you're talking about dae -> upstream dns, it has nothing to do with netns: packets are originated from host.

jschwinger233 commented 4 months ago

Nevermind, it's a different netns with reference to your repro steps... just ignore my last comment.

fancl20 commented 4 months ago

Having IPV6 enabled is not enough. Also I believe lo interface must be UP because otherwise ip -6 route add local default dev lo table 2023 will fail with Error: Nexthop device is not up. Didn't trace how it affects udp listen addr but anyway.

Based on my theory, dae won't work in a IPV4 only environment, this may or may not be a real issue but either we can fix the code or fix the doc and make ipv6 required.

(Personally I don't have time to create a PR for this in the short term, or we can hope golang fix the standard library someday)