NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.27k stars 13.52k forks source link

Undeprecate networking.useDHCP #75515

Closed bjornfor closed 2 years ago

bjornfor commented 4 years ago

Describe the bug Using networking.useDHCP is deprecated since e862dd637350ddd1812a6c1fb5811c6464e74ff5.

But:

  1. Why is networking.useDHCP discouraged/deprecated?
  2. Why does using networkd mean we can no longer have a global default for whether or not to use DHCP?

It seems like a step back to me, having to list machine specific network interfaces in configuration.nix instead of being able to say "use DHCP (or not) for any interface you see". Also ref. https://github.com/NixOS/nixpkgs/issues/73595.

I think/hope networking.useDHCP = true can be mapped to networkd with something like this (untested):

# /etc/systemd/network/10-nixos-dhcp.network
[Match]
Name=*

[Network]
DHCP=yes

CC @globin.

bjornfor commented 4 years ago

@florianjacob: Why the thumbs down? I'm curious why using networking.useDHCP is suddenly a bad idea. Because of networkd somehow? Please explain.

florianjacob commented 4 years ago

About a year ago I banged my head against global useDHCP and related global options which caused alot of problems with networkd, and are / were enforced through a 99-main.network which matches all interfaces like your code snippet. Can't explain the problems on the hoof anymore though, as I disabled all of that stuff since then and manually configure systemd-networkd directly without networking.interfaces as it works / worked so bad with systemd-networkd. The deprecation itself is exactly because the 99-main.network catchall is removed in 20.03: https://github.com/NixOS/nixpkgs/blob/50295a12011334743defec979aff1d1789600f58/nixos/doc/manual/release-notes/rl-2003.xml#L130

If I remember correctly, one main cause is the fact that systemd-networkd does only apply the first network file that matches and ignores all others, which doesn't harmonize with how the networking.interfaces module / global options are designed.

(Thumbs down just because I remember it's a good idea not to have that, thumbs up for documenting and explaining why that decision was made and what the problems were.)

bjornfor commented 4 years ago

@florianjacob: Thank you. I read the linked issues and have a better understanding of the problem now.

If some interfaces must be excluded from networkd control, and a whitelist is preferred, how about this:

networking.useDHCP = [ "en*" "wl*" ];

This could be the new default and have low priority so that per interface settings win. This should be machine agnostic AFAICT.

bjornfor commented 4 years ago

Well, if the above whitelist works, there is actually no need to change the API: useDHCP must simply change from matching all interfaces to the ones starting with "en" and "wl".

mkg20001 commented 4 years ago

IMO there should be a way to override the whitelist, while the boolean true value will make it use the default whitelist a list could override it

Also, some machines have eth0, eth1... renamed interfaces, so just relying on "en" might not work, it should be "e"

bjornfor commented 4 years ago

I remember now I even have a setup with "wan0" and "lan0" interfaces (via udev rule), so even "e*" is not correct/enough.

Is there a way to match real hardware interfaces?

mkg20001 commented 4 years ago

In network manager at least there seems to be the "hw" flag, but I suppose with the right values from /proc/ this flag could be recreated

The only problem would be that it'll likely have to happen on runtime, since the config could get built on any machine

screenshot

bjornfor commented 4 years ago

How is the live CD going to be configured without a global networking.useDHCP? One cannot know the names of the network interfaces beforehand, so there must be some wildcard/global match in NixOS somewhere. If the live CD can be made to work with networkd (I guess that's the plan), surely we can make the installed NixOS too?

Moredread commented 4 years ago

@bjornfor wouldn't a "*" work for the live iso?

bjornfor commented 4 years ago

I think the point was to not match certain interfaces, like the loopback interface ('lo'). But I didn't pick up all the details of the above linked issues, so I could be wrong.

fpletz commented 4 years ago

So there are two related problems at play here.

Networkd wants to configure all interfaces it's configured to manage

If there is no carrier or no DHCP response, the interfaces will stay in the configuring state and will delay network-online.target until either all interfaces are configured or a timeout is reached. Then networking-online.target fails and thus all services depending on it, even if one link has a configured and working connection.

This is not something we want to happen to new users or on NixOS upgrades that might switch to networkd by default.

Note that most other distributions I'm familiar with don't do DHCP on all interfaces by default but have their installer generate a sensible networking config by some kind of autodetection and asking the user. We're just using dhcpcd cleverly. I think this behaviour only makes sense on install mediums where one cannot assume anything of the target. But on install mediums, IMHO, we should rather let the the user just use network-manager and nm-tui for ad-hoc configuration.

Specifically, I think the networking configuration should rather be a conscious decision by the user. Either statically via configuration or dynamically via tools like NetworkManager. Users can still configure dhcpcd or networkd explicitly to run DHCP on whitelisted/blacklisted interfaces they deem useful for the job. I was also thinking of allowing wildcards/globs for networking.interfaces.<ifname> since both our DHCP clients would support it to simplify this.

There is a way though to exclude interfaces from the network-online.target status checks: Network units can set RequiredForOnline=false. But setting this for the catchall DHCP networks would also break network-online.target for services that really rely on a working internet connection on start.

After researching for my response here again, I noticed that systemd-networkd-wait-online now has an --any option which would result in the same behaviour we have for dhcpcd in principle. Except that if we couple static configurations with DHCP on all (other) interfaces, the one statically configured interface without a default route would activate network-online.target which is also not the behaviour we want, strictly speaking.

Also note that this way we have a kind of race condition with both dhcpcd and networkd anyway because acquiring an IP via DHCP on one interface does not necessarily mean we also get a default route that might come from another interface.

How to match all "uplink" interfaces?

Matching for en* wl* ww* would potentially be enough, but only if predictable interface names are enabled (see man systemd.net-naming-scheme). If predictable interfaces names are disabled, we cannot assume anything since the interfaces names are defined by the kernel/drivers could have names like usb0 for usb network cards.

Furthermore, udev exposes the DEVTYPE property which can be accessed via networkd units for matching via Type=. This would be ideal because we could match for ethernet and wifi cards individually. After looking at some hardware, this property is unfortunately not set on some hardware even though the interface is from a physical network card. Not sure if this is a kernel, driver or udev problem.

But: Even though we might have a sensible selector that works with predictable interface names enabled, we have not yet solved the first problem.

Conclusion

It was more sensible for us to remove networking.useDHCP because we aren't sure how to implement a correct solution in networkd via either config or code. Moreover, though our current implementation with dhcpcd is working well for most cases, it is also a source of trouble for others, and it has bugs. And it is enabled by default!

@bjornfor Does this explain our rationale in a way that makes sense to you? What do you think?

globin commented 4 years ago

Closing, as there has been no further reaction and I think @fpletz comment is an adequate answer to the issue. Feel free to reopen if there are further questions!

bjornfor commented 4 years ago

@globin: My lack of response was mostly due to lack of time, not because I think this issue is not relevant anymore. In fact, I don't have a lot of time now either, so sorry for being brief.

@fpletz: Thank you for the detailed post. Here is my response, as an end user who doesn't know all the details:

It was more sensible for us to remove networking.useDHCP because we aren't sure how to implement a correct solution in networkd via either config or code.

That sounds like a perfectly good reason for why things are like they are with networkd, but IMHO not so much for deprecating networking.useDHCP. It sounds like there are issues with the networkd integration in NixOS (and some upstream projects?), not the idea of networking.useDHCP itself, and that networkd is not ready yet to be the default NixOS networking backend.

The move to networkd feels kind of rushed, ref. this issue and https://github.com/NixOS/nixpkgs/issues/73595.

fpletz commented 4 years ago

@bjornfor Sorry that I didn't make my point clear enough and that I was stressing the move to networkd too much.

networking.useDHCP should be removed because it's currently

If you still disagree about the removal, please come up with a sensible implementation instead. We can then also use that logic with networkd.

bjornfor commented 4 years ago

networking.useDHCP should be removed because it's currently

  • buggy (see the edge cases I described)

I only saw bugs / edge cases mentioned for the combination of networkd and networking.useDHCP. For networking.useDHCP alone, what's the problem?

  • does not what its documentation states ("Whether to use DHCP to obtain an IP address and other configuration for all network interfaces that are not manually configured.") because the dhcpcd blacklist will still be applied silently.

Do you mean the networking.dhcpcd.denyInterfaces option + hardcoded list of ignored interfaces from nixos/modules/services/networking/dhcpcd.nix (lo peth* vif* tap* tun* virbr* vnet* vboxnet* sit*)? I guess I always assumed the option was about hardware interfaces, so I don't feel bad when now seeing that list of blacklisted interfaces. We can add in the word "hardware" before "network interfaces" too, to make the docstring more accurate. Does the current implementation cause any problems?

edolstra commented 4 years ago

I don't see a reason to remove networking.useDHCP. It's an "abstract" option not tied to any particular implementation. Whether it enables dhcpcd or systemd's DHCP client is an implementation detail.

bjornfor commented 4 years ago

When https://github.com/NixOS/nixpkgs/issues/73595 gets fixed, I guess the plan is to run nixos-generate-config when adding/removing network interfaces? (Well, not my plan, but it seems we're heading that way.)

I tried nixos-generate-config on my machine and got this:

  networking.useDHCP = false;
  networking.interfaces.docker0.useDHCP = true;   # wrong
  networking.interfaces.enp2s0.useDHCP = true;
  networking.interfaces.tun0.useDHCP = true;      # wrong
  networking.interfaces.vboxnet0.useDHCP = true;  # wrong
  networking.interfaces.wlp3s0.useDHCP = true;

So the thinking is that networking.useDHCP should be removed because it has a (hidden) blacklist, whereas without a blacklist you get that behaviour like above? I don't think that's an improvement.

davidak commented 4 years ago

If there is no carrier or no DHCP response, the interfaces will stay in the configuring state and will delay network-online.target until either all interfaces are configured or a timeout is reached. Then networking-online.target fails and thus all services depending on it, even if one link has a configured and working connection.

@fpletz isn't the normal behavior that the system tries to get an IP via DHCP and when it don't get one, assign itself a link local address?

Link local addresses allow machines to automatically have an IP address on a network if they haven't been manually configured or automatically configured by a special server on the network (DHCP). Before an address is chosen from that range, the machine sends out a special message (using ARP which stands for address resolution protocol) to the machines on the network around it (assuming that they also haven't been assigned an address manually or automatically) to find out if 169.254.1.1 is free. If it is, then the machine assigns that address to its network card. If that address is already in use by another machine on the same network, then it tries the next IP 169.254.1.2 and so on, until it finds a free address.

Source: https://serverfault.com/a/118329

So, can we get that behavior with networkd?

I'm always for sane defaults. Do what the user expects. So we can implement a blacklist logic for interfaces that are configured automatically by other programs, like docker0, tun0, vboxnet0.

Someone can ask systemd if the features we need are supported now, or if they will implement them ever? With that information, we can make an informed decision how to proceed here to finish the release.

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/networking-usenetworkd-and-usedhcp/4352/2

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/networking-usenetworkd-and-usedhcp/4352/3

Ericson2314 commented 3 years ago

@fpletz When you mention network manager, are you saying that the global useDHCP = true isn't needed when network manager is used?

nh2 commented 3 years ago

Another possible issue to consider:

https://github.com/NixOS/nixpkgs/issues/109389#issuecomment-760381746 (Using Docker on AWS EC2 breaks EC2 metadata route because of DHCP)

nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/persistent-network-interfaces-nixos-and-usb-wifi-dongles-how-can-i-get-all-three-to-play-nicely/13587/3

KizzyCode commented 2 years ago

There are two problems for me with deprecating (and subsequently removing) networking.useDHCP:

IMO there is no full replacement for networking.useDHCP yet, therefore I also disagree with it's deprecation (even if I understand most of the reasons). Maybe as a compromise: Set the default value for networking.useDHCP to false and add a warning about oddments and quirks, especially when used with networkd?

Specifically, I think the networking configuration should rather be a conscious decision by the user.

Well I see the point but I don't fully agree – the basic idea behind DHCP is zero-config, so it should be possible to fully opt-in to auto-configuration for all physical interfaces. Then, if I add a new ethernet-card/WiFi-dongle, it should have full auto-configuration (even within initrd if networking is enabled there). And if I remove/unplug the interfaces, they should be "deconfigured" automatically.

AFAIK this does not yet work reliably with networkd (see all the complaints about USB-ethernet or WiFi dongles not connecting out-of-box or unplugged dongles blocking network-dependent services on boot).

bjornfor commented 2 years ago

Looks like this issue is about to be solved: https://github.com/NixOS/nixpkgs/pull/167327

bjornfor commented 2 years ago

Thank you, @lheckemann! :tada: