NetworkConfiguration / dhcpcd

DHCP / IPv4LL / IPv6RA / DHCPv6 client.
https://roy.marples.name/projects/dhcpcd
BSD 2-Clause "Simplified" License
349 stars 112 forks source link

How to establish a bond under dhcpcd? #84

Open Jibun-no-Kage opened 2 years ago

Jibun-no-Kage commented 2 years ago

How to establish a bond under dhcpcd? Love the idea of the failback profile logic, but when I setup dhcpcd for dhcp with static failback profile, and then create a bond under networking itnerfaces, I trip over the start up sequence order, such that no interfaces actually exist on reboot. If I let dhcpcd trigger when I run systemctl restart networking, everything works. So clearly something is stepping on something during system start up. Any insight or suggestions? Is there a way to create a bond strictly under dhcpcd control? From a configuration perspective?

rsmarples commented 2 years ago

You can create a hook script to create the bond based on some config i guess.

/etc/dhcpcd.exit-hook

if [ $your_condition ]; then
    ip link add bond0 type bond mode 802.3ad
    ip link set eth0 master bond0
    ip link set eth1 master bond0
fi

Does that help?

Jibun-no-Kage commented 2 years ago

Yes and no, I hoped there was a more native or supported, i.e. official, way to do such via dhcpcd. But it does suggest one way to address the issue. Since my goal is to get away from having to use older /etc/network/interfaces methodology, I was thinking more towards dhcpcd... since I like the idea of a fallback method. I guess I could also go the systemd route, eh, no pun intended. Maybe this step towards a feature request to have dhcpcd support bonding in some official manner?

rsmarples commented 2 years ago

dhcpcd's job is just to configure addressing and routing on a given interface. Setting the interface up is the job of something else.

Jibun-no-Kage commented 2 years ago

True, but there are many scenarios outlined via google where traditional bonding does not work well with dhcpcd. I have experienced this corundum myself, where the typical load sequence at boot trips up between dhcpcd and the networking service. So my idea was to figure out a way to try to address this, I happen to love the idea of the static failback feature of dhcpcd. So, I first tried using dhcpcd with a typical bond configuration in /etc/network/interfaces configuration file. The method worked if I ran the command sequence manually but on boot, the bond was never established. So the idea occurred to me, that maybe not using a typical bonding configuration could be eliminated, hence my question above. Using the exit idea is the next thing to try. I am surprised this corundum has not been documented on the internet, or maybe I just have not googled for it right. :)

rsmarples commented 2 years ago

I fail to see how dhcpcd can trip up with bonding - you start it with no interfaces specified or with the -m flag and dhcpcd will react to interfaces arriving and departing. A bond interface by itself is nothing special.

Now there might be some extra delay when dhcpcd spots the bond interface and starts up on it, but if the OS is half decent it will report no carrier on it until an interface joins it with a carrier and thus there is no delay. But this is no different from a bridge interface from dhcpcd's perspective.

Jibun-no-Kage commented 2 years ago

There seems to be some odd interaction. A race condition or sequence issue with how the two services interact. For example, if I run dhcpcd -k then run systemctl restart networking, my bond comes up and the bond0 interface has an IP via dhcp, because I have denied the slave interfaces and only allowed the bond interface. All good.

However, if I reboot the same device, with no changes, at all, the bond interface fails, and thus no dhcp results. If I then run dhcpcd -k and then systemctl restart networking on the just rebooted device, the bond interface works, dhcp assignment happens.

I can reproduce this on buster and bullseye, I happen to be testing on a Raspberry Pi device, but I have created the same scenario on Intel/AMD based hardware, so this is not a ARM based hardware quirk IMHO.

Here is what drives me nuts... if I remove dhcpcd, i.e. disable dhcpcd, and setup a static configuration for ip, everything works via cli and on reboot without issue. It is only when I use dhcpcd, networking service and during a reboot, I get this odd behavior, as I noted above.

Clearly something is stepping on something somewhere during the boot start of dhpcd, networking, etc. I choose to start asking why this is the case here... because you have to start somewhere, and I had determined that dhcpcd is part of the scenario encountered as I noted above. Given the expertise here, I am sure someone can figure out what the heck is going on.

rsmarples commented 2 years ago

76 has been merged into master and may help here if DHCPv6 or IPv6 was in play.

Jibun-no-Kage commented 2 years ago

Ok, just for reference I have IPv6 disabled at the kernel level when I discovered this issue.

rsmarples commented 2 years ago

How exactly are you starting or dhcpcd?

Jibun-no-Kage commented 2 years ago

When I am working on a bond configuration, typically I leave it enabled as default so it starts with system start, i.e. the unit file under systemd. But when I am explicitly testing a bond configuration I will at times stop via -k and then let it restart when I do systemctl restart networking. This manual restart of networking is not 'that' clean, so usually I just do a graceful restart of the system to be sure the mimic a typical system start/boot.

The original goal was to let dhcpcd work with a DHCP server, and configured with a static fall back, so the bond0 interface is always a know ip assignment, in the DHCP server, using a reservation of course. But some distros of Linux don't always handle this well, they sometimes expect the entire bond0 setup to be in the interfaces file, regardless of a static or dhcp ip assignment elsewhere.

There does not seem to be any explicit denial that dhcpcd can't be used with a bond configuration, so that is why I was testing and trying to figure out if it is possible... it should be because a bond is just another interface from dhcpcd perspective. If I set the 'slaved' interfaces to denyinterface in dhcpcd.conf, and only allow dhcpcd to handle bond0 (i.e.. interface bond0), this works in some cases, on some distros.

When I opened the question, I was thinking that maybe dhcpcd could own the bond configuration, feature enhancement maybe, and this would completely retire the need for using the interfaces file. But I realized that dhcpcd is really just a dhcp client, so it should accept whatever interface if finds available, i.e. bond0. But that does not explain why when using a bond, and the older interfaces file, and explicitly disabling dhcpcd, everything seems to work. but when using dhcpcd, the bond fails to come up successfully at times, even if configured to use dhcp configuration. Feels like something is stepping on something or something is not handing off control (or back to something) in the network stack.

Most of my testing has been around or on Debian 10, 11 x64 and Pi OS 10 and 11 (i.e. Debian 10, 11 ARM), just for reference.