Open krombel opened 5 years ago
For dnsdist, can you tell us if the interface
option at https://dnsdist.org/reference/config.html#addLocal covers your needs?
You are right. Adding "interface=external" worked as expected. Have missed that option. Thanks for clearing things up :slightly_smiling_face:
I just realize, that I missed that this config is only available for dnsdist. How is this working for authoritative and recursor?
How is this working for authoritative and recursor?
I don't think they can do it right now.
It also seems that the newServer call cannot handle vrfs.
If I add for example:
newServer({address="2001:608:a01::40", name="gw01", source="vrf_external"}) -- downstream servers for recursion
It still picks the outgoing ip of the master vrf. And if I define an IP with ip@vrf_external it cannot bind anymore :/.
Oct 02 12:00:16 webfrontend01 dnsdist[27956]: Fatal Lua error: [string "chunk"]:22: Caught exception: binding socket to [2001:608:a01::3]:0: Cannot assign requested address
If you only specify the interface we ask the kernel to use that interface, and let the choice of the IP to use to it.
Specifying both the address and interface works here, your message means that the bind()
call returned -1
, setting errno
to EADDRNOTAVAIL
, which implies that 2001:608:a01::3
does not exist on the system.
The IP exists on the interface and on the system just in a different vrf and thus seems not to be seen by dnsdist at startup.
The output of ip addr show
and perhaps a strace
of the dnsdist process during startup would be useful, as I believe bind()
should succeed here if the address exists.
Here you go:
webfrontend01.in.ffmuc.net:~# ip -br link show master vrf_external
vlan3 UP 52:54:00:c7:81:c1 <BROADCAST,MULTICAST,UP,LOWER_UP>
webfrontend01.in.ffmuc.net:~# ip vrf
Name Table
-----------------------
vrf_external 1023
webfrontend01.in.ffmuc.net:~# ip -br a
lo UNKNOWN 127.0.0.1/8 10.80.255.19/32 2001:608:a01:ffff::19/128 ::1/128
vlan1000 UP 10.80.248.7/27 2001:608:a01:ff02:5054:ff:febe:5dbc/64 fe80::5054:ff:febe:5dbc/64
vlan3 UP 195.30.94.28/29 2001:608:a01::3/64 2001:608:a01::27/64 fe80::5054:ff:fec7:81c1/64
vrf_external UP
Either with:
newServer({address="2001:608:a01::40", name="gw01", source="2001:608:a01::3@vrf_external"})
or
newServer({address="2001:608:a01::40", name="gw01", source="2001:608:a01::3@vlan3"})
setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:608:a01::3", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
close(6) = 0
close(5) = 0
close(4)
so neither specifying the VRF nor the interface in the VRF helps
Would you by any chance be able to test after setting these two settings?
sysctl -w net.ipv4.tcp_l3mdev_accept=1
sysctl -w net.ipv4.udp_l3mdev_accept=1
This might be needed to be able to bind a socket to a different VRF context that the one the program executes in, ie the default one.
Apparently an other option would to run dnsdist in the right VRF context via ip vrf exec...
.
Would you by any chance be able to test after setting these two settings?
sysctl -w net.ipv4.tcp_l3mdev_accept=1 sysctl -w net.ipv4.udp_l3mdev_accept=1
Those settings are default for all our VMs with VRF :). So it's already enabled.
This might be needed to be able to bind a socket to a different VRF context that the one the program executes in, ie the default one.
Nope, it's not needed for binding it's needed to get the correct interface to answer on if you are bound to :: or 0.0.0.0 aka global listen socket.
tcp_l3mdev_accept - BOOLEAN Enables child sockets to inherit the L3 master device index. Enabling this option allows a "global" listen socket to work across L3 master domains (e.g., VRFs) with connected sockets derived from the listen socket to be bound to the L3 domain in which the packets originated. Only valid when the kernel was compiled with CONFIG_NET_L3_MASTER_DEV. Default: 0 (disabled)
Apparently an other option would to run dnsdist in the right VRF context via
ip vrf exec...
.
Yeah, we did that before but that breaks the possibility to use it outside of the VRF :). So it should just set the correct SOCKOPTS to bind :). Like it does for the listen contexts.
addTLSLocal("0.0.0.0", ssl_cert, ssl_key, { doTCP=true, reusePort=true, interface="vrf_external" })
Well, "just set the correct SOCKOPTS to bind" is easy to say but not very well documented, unfortunately.
The documentation at 1 states that specifying the output interface using cmsg
and IP_PKTINFO
like we already do should be enough, but I guess it's a lie. That's too bad because it's the only portable way of doing it.
Is there any chance you could test the patch at https://github.com/PowerDNS/pdns/pull/8372 ? It tries to do it the Linux way (SO_BINDTODEVICE
) in addition to the existing one but I don't have a setup to test it right now.
I will try to compile it on one of our servers in the next days and report back.
If I am not wrong I am using the version from your Draft now. But it doesn't seem to work yet.
socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 7
setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:608:a01::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
close(6) = 0
close(5) = 0
close(4) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "Fatal Lua error: [string \"chunk\"]:22: Caught exception: binding socket to [2001:608:a01::1]:0: Cannot assign requested address\n", 127Fatal Lua error: [string "chunk"]:22: Caught exception: binding socket to [2001:608:a01::1]:0: Cannot assign requested address
) = 127
exit_group(1) = ?
+++ exited with 1 +++
legacygw.in.ffmuc.net:~# /usr/local/bin/dnsdist --version
dnsdist 0.0.17744.0.ddistvrfitf.g70b0d0e296 (Lua 5.3.3)
Enabled features: ebpf ipcipher recvmmsg/sendmmsg
Config looks like this:
newServer({address="2001:608:a01::40", name="gw01", source="2001:608:a01::1@vlan3"}) -- downstream servers for recursion
ip is there:
vlan3 UP 195.30.94.27/29 2001:608:a01::53/64 2001:608:a01::1/64 fe80::5054:ff:fe02:a677/64
That look's like setsockop (SO_BINDTODEV..) isn't called. What does ltrace say about that?
Maybe run ltrace -e setsockopt
for more readabilty :-)
Looking at the patch linked about I'm wondering where #ifdef SO_BINDTODEVICE
get's defined and if it is on your system. Maybe remove the `#ifdef' and try again (just a hunch).
I just found a bug in the patch (duplicate variable name leading to a variable being shadowed, not sure why it's not reported by the compiler), I'll update it soon. Sorry about that, I tested a previous version and introduced the bug when cleaning it up..
Updated.
Ok, that looks better at startup:
setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(7, SOL_SOCKET, SO_BINDTODEVICE, "vlan3", 5) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:608:a01::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "2001:608:a01::40", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
mmap(NULL, 17825792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4eb78e0000
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x1), ...}) = 0
write(1, "Added downstream server [2001:60"..., 46Added downstream server [2001:608:a01::40]:53
) = 46
close(5)
But it seems it does a rebind(?) every once in a while which does not call that function?
[pid 14073] setsockopt(17, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 14073] setsockopt(17, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 14073] bind(17, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:608:a01::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
[pid 14073] close(17) = 0
[pid 14073] nanosleep({tv_sec=1, tv_nsec=0}, <unfinished ...>
Additional ltrace output:
legacygw.in.ffmuc.net:~/pdns/pdns/dnsdistdist# ltrace -e setsockopt /usr/local/bin/dnsdist --supervised --disable-syslog -u _dnsdist -g _dnsdist -C /etc/dnsdist/dnsdist.conf
dnsdist->setsockopt(7, 1, 2, 0x7fff5c28af94) = 0
dnsdist->setsockopt(7, 1, 25, 0x55f9cd41b490) = 0
Added downstream server [2001:608:a01::40]:53
dnsdist->setsockopt(5, 1, 2, 0x7fff5c28b584) = 0
dnsdist->setsockopt(8, 1, 2, 0x7fff5c28b8a4) = 0
Calling setKey() while libsodium support has not been enabled is not secure, and will result in cleartext communications
dnsdist->setsockopt(4, 0, 15, 0x7fff5c28c51c) = 0
dnsdist->setsockopt(4, 0, 10, 0x7fff5c28c2b4) = 0
dnsdist->setsockopt(9, 1, 2, 0x7fff5c28c2c4) = 0
dnsdist->setsockopt(9, 6, 9, 0x7fff5c28c2c4) = 0
dnsdist->setsockopt(9, 0, 15, 0x7fff5c28c51c) = 0
Listening on 127.0.0.1:53
dnsdist 0.0.17744.0.ddistvrfitf.g70b0d0e296 comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2
ACL allowing queries from: 0.0.0.0/0, ::/0
Console ACL allowing connections from: ::1/128, 127.0.0.1/8
Warning, this configuration can use more than 5013 file descriptors, web server and console connections not included, and the current limit is 1024.
You can increase this value by using ulimit.
Webserver launched on 10.80.248.21:8127
dnsdist->setsockopt(13, 1, 2, 0x7fff5c28c18c) = 0
dnsdist->setsockopt(13, 1, 2, 0x7fff5c28c1ac) = 0
Marking downstream gw01 ([2001:608:a01::40]:53) as 'down'
Accepting control connections on 127.0.0.1:5199
Error while retrieving the security update for version dnsdist-0.0.17744.0.ddistvrfitf.g70b0d0e296: Unable to get a valid Security Status update
Not validating response for security status update, this is a non-release version.
But it seems it does a rebind(?) every once in a while which does not call that function?
It's the health check, which creates a new socket every time. Of course we need to apply the new socket option there as well, on it!
Pushed! Thanks a lot for the feedback, this is very much appreciated!
No problem :). I want the feature so it's my responsibility to help to test it if someone has the time and mood to implement it :).
Thank you very much for this!
It seems it doesn't work yet, it's not using the SO_BINDTODEVICE yet.
[pid 2475] setsockopt(18, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 2475] setsockopt(18, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 2475] bind(18, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:608:a01::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
[pid 2475] close(18)
Should be better now, the option needs to be applied before we attempt to bind!
It seems now the setting gets overwritten right after setting it. And we get a operation not permitted ... maybe because privileges get dropped before?
[pid 10800] setsockopt(18, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 10800] setsockopt(18, SOL_SOCKET, SO_BINDTODEVICE, "vlan3", 5) = -1 EPERM (Operation not permitted)
[pid 10800] setsockopt(18, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 10800] bind(18, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:608:a01::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
[pid 10800] close(18) = 0
I'm not sure why you think it's overwritten?
But yes, you are right, we have an issue here because setting SO_BINDTODEVICE
requires CAP_NET_RAW
which we don't have anymore at this point. I'm actually not sure how we should fix that..
I'm not sure why you think it's overwritten?
Actually thought the wrong way sorry :). Was about to edit it but you were faster ;).
But yes, you are right, we have an issue here because setting
SO_BINDTODEVICE
requiresCAP_NET_RAW
which we don't have anymore at this point. I'm actually not sure how we should fix that..
Is it possible to bind the healthcheck socket once at startup and keep it? But I think that would miss the point of the feature :/.
Is it possible to bind the healthcheck socket once at startup and keep it? But I think that would miss the point of the feature :/.
It would be possible to do so but that might prevent us from detecting some network issues in the health check. I think our best option would be to keep CAP_NET_RAW
when at least one backends has specified a source interface. It would require a bit of work but that seems doable.
That sounds like a good solution.
I pushed a commit to do just that, note that it is quite experimental and might require updating CapabilityBoundingSet
if you use our systemd unit file.
At the moment I am just running it as root directly from the shell is that enough or am I missing something? It seems it's not getting passend on.
[pid 5478] setsockopt(18, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 5478] setsockopt(18, SOL_SOCKET, SO_BINDTODEVICE, "vlan3", 5) = -1 EPERM (Operation not permitted)
[pid 5478] setsockopt(18, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 5478] bind(18, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:608:a01::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
[pid 5478] close(18)
That's weird, it does work for me™. Could you strace from the beginning? I'm interested in the capget
and capset
syscalls. Could you also run it without -u
or -g
, just to be sure?
Right, with -u
or -g
we lose all capabilities as soon as we call setuid()
. The code to keep the capability around only works when started under systemd with User=
and CapabilityBoundingSet=
correctly set, or if you start as root without -u
or -g
.
Yep I had -u and -g set now it works!
[pid 10916] setsockopt(12, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 10916] setsockopt(12, SOL_SOCKET, SO_BINDTODEVICE, "vlan3", 5) = 0
[pid 10916] setsockopt(12, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 10916] bind(12, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:608:a01::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
[pid 10916] connect(12, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "2001:608:a01::40", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
[pid 10916] sendmsg(12, {msg_name={sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "2001:608:a01::40", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=28, msg_iov=[{iov_base="\20\34\1\0\0\1\0\0\0\0\0\0\1a\froot-servers\3net\0"..., iov_len=36}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32}], msg_controllen=40, msg_flags=0}, 0) = 36
[pid 10916] poll([{fd=12, events=POLLIN}], 1, 1000) = 1 ([{fd=12, revents=POLLIN}])
[pid 10916] recvfrom(12, "\20\34\201\200\0\1\0\1\0\0\0\0\1a\froot-servers\3net\0"..., 4096, 0, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "2001:608:a01::40", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 52
And the backend is up!
And the healthchecks use the correct interface!
Usecase
At the moment it is not possible to define per service on which VRF they should listen on. This applies to the Authoriative, Recursor.
The main use case for me is the webserver component: I want it running on localhost (to e.g. allow metrics scraping) but not my VRF in which the service should listen to because that would require using a public ip address
When I use e.g.
ip vrf exec external /path/to/recursor [...]
I am enforced to have the webserver running on a public interface as well.When I try to let it listen on 127.0.0.1 (started with
ip vrf exec ...
) it fails withWhen I do not use the vrf the service is not publicly accessible
Description
Have an option to bind to vrf per service. If there is such option already only the doc for this is missing