NetworkConfiguration / dhcpcd

DHCP / IPv4LL / IPv6RA / DHCPv6 client.
https://roy.marples.name/projects/dhcpcd
BSD 2-Clause "Simplified" License
324 stars 103 forks source link

dhcpd installs IPv6 RA routes into wrong routing table when device is enslaved VRF #238

Open DanielG opened 10 months ago

DanielG commented 10 months ago

Hi,

I'm trying to use Linux VRFs (kernel docs link) on a system running dhcpcd for address configuration. I bring up the device using ifupdown by enslaving it to an existing VRF in pre-up with ip link set dev $IFACE master my-vrf.

When using Linux's native addrconf IPv6 RA implementation this causes eg. the default route to be installed into the VRF's associated routing table, but dhcpcd seems to bypass addrconf by setting accept_ra=0 and wants to handle RAs itself.

I tried to find a config option to disable just the dhcpcd IPv6 autoconfig and let the kernel do it, since I still want IPv4 DHCP to be handled by dhcpcd, but I couldn't find one.

So this issue is essentially a request for (one of) two features:

Thanks, --Daniel

DanielG commented 10 months ago

After a quick reading of the source code I found the noipv6rs option does what I want and lets the kernel handle IPv6 addrconf. It's rather poorly documented though. The manpage says:

ipv6rs  Enables IPv6 Router Advertisement solicitation.  This is on by
        default, but is documented here in the case where it is disabled
        globally but needs to be enabled for one interface. 

I mean the kernel will also do router-solicitations so my initial reading of this was that it would stop doing addrconf entirely. Should probably say something like "Enables dhcpcd internal handling for IPv6 Router Advertisements, when disabled the kernel's RA functionality is used.`

--Daniel

rsmarples commented 10 months ago

I have a patch which should assign routes to the correct table, but in my testing it fails. The result from the kernel for IFLA_INFO_KIND is $ where I expect vrf and RTA_OK for the payload data of IFLA_INFO_DATA is failing so I'm doing something wrong vs what iproute2 sends but I don't see where just yet.

Patch is really ugly though as it adds a netlink call for every route added, but it's low risk in with the codebase as it stands. Ideally we need to move away from getifaddrs and just use netlink (route(4) for BSD) so we can store the tableid on changes.

rsmarples commented 10 months ago

I've added issue #242 which will allow us to improve the work done here in the future.

rsmarples commented 10 months ago

Actually this is a touch more complicated because dhcpcd only does calculations on the main routing table. This will require more thought.

DanielG commented 10 months ago

To start with a global option to change which routing table is operated on would already help.

I can always run one dhcpcd per VRF, but we'd have to figure out how binding to a VRF device behaves when sending DHCP messages over (I assume) RAW sockets.

rsmarples commented 10 months ago

dhcpcd uses BPF for the initial DHCP setup which works below the network layer so it doesn't really care about VRF or routing or tables. DHCPv6 on the other hand does uses the network layer, but it only operates by default on local link broadcasts.

It does raise an interesting point though because in master mode we only open the one DHCPv6 socket to handle all interfaces and I suspect that won't work for VRF? When not in master mode, DHCPv6 will open a socket per each interface address and pick the first LINKLOCAL one to send from which should work.

I could add a patch where dhcpcd detects the vrf table id at startup when not in master mode and sets the default table used for it's lifetime to that. Changing the tableid while running would be unsupported. Would that help to start with?

rsmarples commented 10 months ago

@DanielG can you test the above branch please? it might do the right thing!

DanielG commented 10 months ago

Thanks for the quick patch turnaround :)

It does raise an interesting point though because in master mode we only open the one DHCPv6 socket to handle all interfaces and I suspect that won't work for VRF? When not in master mode, DHCPv6 will open a socket per each interface address and pick the first LINKLOCAL one to send from which should work.

Whether or not a UDP/RAW socket in the "default" VRF will receive packets from other VRFs is controlled by the udp_l3mdev_accept/raw_l3mdev_accept sysctls being set =1, so dhcpcd would ideally be able to deal with both scenarios as this sysctl is a matter of local (security) policy.

Binding the socket to the physical device will ofc. still allow packets to be received regardless of which VRF this device is in, this only applies to un-bound sockets.

I could add a patch where dhcpcd detects the vrf table id at startup when not in master mode and sets the default table used for it's lifetime to that. Changing the tableid while running would be unsupported. Would that help to start with?

That should be fine. If it ends up bothering me I'll just send patches ;)

As for your patch, a couple of comments: the correct type for rt_table is uint32_t not uchar. The way you're doing the vrf rt_table lookup now only works for the VRF device itself not any devices enslaved to it. However running dhcpcd on the VRF device itself fails with a ps_bpf_recvmsg: No buffer space available.

I think I can take the patch from here unless you feel like playing with VRFs :)

DanielG commented 10 months ago

FYI: for testing:

$ ip link add vrf0 type vrf table 100

$ ip link set dev $IFACE master vrf0
# To revert:
$ ip link set dev $IFACE nomaster
rsmarples commented 10 months ago

As for your patch, a couple of comments: the correct type for rt_table is uint32_t not uchar. The way you're doing the vrf rt_table lookup now only works for the VRF device itself not any devices enslaved to it. However running dhcpcd on the VRF device itself fails with a ps_bpf_recvmsg: No buffer space available.

Linux headers disagree with itself: https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/rtnetlink.h#L237

So the tableid reported is a uint32_t, but the tableid you can set is a uchar.

And yes, I got the no buffer space message as well. I suspect it's because my understanding of VRF is that it needs an input interface, an output interface and it's own routing table. The merits of why escape me but that's my lack of knowledge and I didn't investigate much further.

I think I can take the patch from here unless you feel like playing with VRFs :)

Not really. I have other issues here I still need to work through. I'm happy to fix stuff with easy replicated stuff though.

If you feel like some patches feel free to put some my way and I'll review them and if alls good also merge in.

DanielG commented 10 months ago

So the tableid reported is a uint32_t, but the tableid you can set is a uchar.

Looking at the kernel code IFLA_VRF_TABLE is definetly U32. I think rtm_table is the legacy way to get/set the table_id for routes, you can use RTA_TABLE instead with is NLA_U32.

I suspect it's because my understanding of VRF is that it needs an input interface, an output interface and it's own routing table. The merits of why escape me but that's my lack of knowledge and I didn't investigate much further.

A single interface in a VRF can already be useful, that's what I'm doing. I want to move my (multiple) upstream wifi/eth etc. interfaces into their own individual VRFs and use each VRF as the underlay for a bunch of wg tunnels which terminate in my main VRF for actual system use.

The idea is that this isolates my system from all sorts of DHCP route hijack attacks without complicated (and fragile) policy routing rules, cf. https://github.com/vanhoefm/vpnleaks. That's just a bonus though really I just want run lots of tunnels via redundant upstream interfaces without too much hassle :)

I'm just getting started with VRFs myself so I'm happy to run any experiments you might need to better understand the socket API behaviour.

Not really. I have other issues here I still need to work through. I'm happy to fix stuff with easy replicated stuff though.

Feel free I was just offering to take some load off your cpu :)

I'll likely need a couple of days to circle back to dhcp concerns, got some other stuff I have to hack on first, so I don't think we'll conflict either way.

--Daniel