Closed Mic92 closed 5 years ago
FWIW, the updated udev rule files are shipped within the systemd package:
${udev}/lib/udev/rules.d/80-net-setup-link.rules
should do the trick
It looks like that udev rules file does not rename the network interfaces anymore. As doesn't any other of the shipped rules with systemd.
It looks like this functionality was moved into networkd through https://github.com/NixOS/systemd/blob/nixos-v239/network/99-default.link. Not sure if we want to enable networkd by default just for predictable interface names. I've just tested it and it would work fine with our scripted networking though.
There are two options:
Networkd used to terminate itself, when it was finished with static network configuration and nothing is left to do. I don't know if this is still the case. I remember they have disabled something like that but I cannot tell if this was only temporary.
I haven't seen networkd exit in a long time. It always waits for new interfaces and configures them according to the configuration.
( Related: https://github.com/NixOS/nixpkgs/commit/788c5195f36fe101ecbf016137e017655063bc6b , by the way)
@Mic92 I don't see how the new shipped udev
rules would aid in container detection. Afaik
that has nothing to do with udev but instead with the ConditionVirtualization
stuff in systemd.
man systemd.network
Virtualization=
Checks whether the system is executed in a virtualized environment and optionally test whether it is a specific implementation. See
"ConditionVirtualization=" in systemd.unit(5) for details.
man systemd.unit
ConditionVirtualization= may be used to check whether the system is executed in a virtualized environment and optionally test whether it
is a specific implementation. Takes either boolean value to check if being executed in any virtualized environment, or one of vm and
container to test against a generic type of virtualization solution, or one of qemu, kvm, zvm, vmware, microsoft, oracle, xen, bochs,
uml, bhyve, qnx, openvz, lxc, lxc-libvirt, systemd-nspawn, docker, rkt to test against a specific implementation, or private-users to
check whether we are running in a user namespace. See systemd-detect-virt(1) for a full list of known virtualization technologies and
their identifiers. If multiple virtualization technologies are nested, only the innermost is considered. The test may be negated by
prepending an exclamation mark.
What exactly is currently broken in matching on containers in networkd and what makes
you think it has to do with udev
?
No it is not about the detecting containers, but container network interfaces:
# lib/systemd/network/80-container-ve.network
[Match]
Name=ve-*
Driver=veth # <-- This driver is not detected with our udev rules
[Network]
# Default to using a /28 prefix, giving up to 13 addresses per container.
Address=0.0.0.0/28
LinkLocalAddressing=yes
DHCPServer=yes
IPMasquerade=yes
LLDP=yes
EmitLLDP=customer-bridge
Given that networkd doesn't touch any interfaces it doesn't explicitly manage, I think it's harmless to enable it by default @fpletz .
With enable I mean run networkd
regardless of whether networking.useNetworkd = true
Having networkd manage interfaces created by nspawn and friends would be nice. When networkd is enabled, it honors the following files:
${pkgs.systemd}/lib/systemd/network/80-container-host0.network
${pkgs.systemd}/lib/systemd/network/80-container-ve.network
${pkgs.systemd}/lib/systemd/network/80-container-vz.network
${pkgs.systemd}/lib/systemd/network/99-default.link
On top of that, there's a /etc/systemd/network/99-main.network
created by nixos/modules/tasks/network-interfaces-systemd.nix
, and a /etc/systemd/network/40-vboxnet0.network
created by nixos/modules/virtualisation/virtualbox-host.nix
I'm not sure if simply enabling networkd breaks some scenarios.
Things like nixos/modules/virtualisation/containers.nix
do some shell-based network interface setup. We might need to change some of the logic in there to make use of the native networkd-provided networking - or provide some more explicit configuration in the module if needed.
On top of that, when switching my configuration, systemd-networkd-wait-online.service
waited a loong time some interfaces to become online - in my case, vboxnet0
, until it did timeout (the link has no carrier).
virbr0
, virbr0-nic
and docker0
are also in state configuring
We might need to add some exclusion rules for things like that too - setting RequiredForOnline=no
in a specific .network
file might do the trick.
Note that
nixos/modules/tasks/network-interfaces-systemd.nix
would only be created with useNetworkd = true;
and apparently the 99-main.network
that it generates wildcard matches all interfaces to not break semantics of networking.useDHCP
which will break container stuff (Also see https://github.com/NixOS/nixpkgs/issues/18962)
So managing container network with networkd will work when networking.useNetworkd = false;
but will probably break when networking.useNetwork = true;
to make stuff more complicated :)
@andir seems to have recently removed the renaming rule in question https://github.com/NixOS/nixpkgs/commit/1f03f6fc43a6f71b8204adf6cd02fb3685261add#diff-c1c886b16586c62e53e0d38c07f9bb6d and lets the kernel
rename the network instead
This means we can just ship ${udev}/lib/udev/rules.d/80-net-setup-link.rules
and stuff should work. I'll go create a PR.
Also shipping ${pkgs.systemd}/lib/systemd/network/99-default.link
will not hurt
as the NamePolicy
will check whether the kernel
did the renaming already:
NamePolicy=keep kernel database onboard slot path
MACAddressPolicy=persistent
Conclusion: network renaming now works both with and without networkd
enabled.
We can freely include the ${udev}/lib/udev/rules.d/80-net-setup-link.rules
rule to fix @Mic92 's issue
And we can include these network rules in our systemd
module for making systemd-nspawn
networking work as expected (and later use that as a base to get rid of scripted networking inside nixos-container
)
${pkgs.systemd}/lib/systemd/network/80-container-host0.network
${pkgs.systemd}/lib/systemd/network/80-container-ve.network
${pkgs.systemd}/lib/systemd/network/80-container-vz.network
${pkgs.systemd}/lib/systemd/network/99-default.link
and be done with it and everything should work as far as I can see.
@arianvp did you get around to create that PR yet?
@andir @arianvp, note that commit #https://github.com/NixOS/nixpkgs/commit/1f03f6fc43a6f71b8204adf6cd02fb3685261add introduces arp networking problems with bonded nics, at least as far as I've tested on packet.net. For instance, spinning up a c2.medium.x86 (AMD) or c1.small.x86 (Intel) server on master nixpkgs with packet.net and default bonded bond0 NIC using 802.3ad LACP (set up automatically with the provisioning script) will result in no network connectivity for 15 - 30 minutes. During this time tcpdump is observed (via sos-console) to request arp who-has
for the gateway during which the gateway will respond with is-at
MAC addresses, but the arp table will continue to show incomplete
for the gateway during this time. Rebooting or stopping and re-raising the bond interface will cause the loss of connectivity again for an extended period.
cc: @disassembler @cleverca22
@johnalotoski can you provide the networkctl status
command for both the bond interface, and the underlying nic interface?
Hi @flokli, networkd is not in use, but here is the output. Bisect led to this particular commit and it reliably reproduces the arp problem. I can provide more info if that would be helpful (cat /proc/net/bonding/bond0, ethtool, etc). IPs/MACs below masked for privacy.
[root@c2ipxe:~]# networkctl status bond0
WARNING: systemd-networkd is not running, output will be incomplete.
● 4: bond0
Link File: /nix/store/gaz60mpylxry2qskvw045h803lv5lil6-systemd-242/lib/systemd/network/99-default.link
Network File: n/a
Type: bond
State: n/a (unmanaged)
Driver: bonding
HW Address: xx:yy:zz:c0:ef:35
Address: PrivIPv4
PubIPv4
PrivIPv6
PubIPv6
Gateway: GatewayIPv4
GatewayIPv6
[root@c2ipxe:~]# networkctl status enp1s0f0
WARNING: systemd-networkd is not running, output will be incomplete.
● 2: enp1s0f0
Link File: /nix/store/gaz60mpylxry2qskvw045h803lv5lil6-systemd-242/lib/systemd/network/99-default.link
Network File: n/a
Type: ether
State: n/a (unmanaged)
Path: pci-0000:01:00.0
Driver: mlx5_core
Vendor: Mellanox Technologies
Model: MT27710 Family [ConnectX-4 Lx] (Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT)
HW Address: xx:yy:zz:c0:ef:35
[root@c2ipxe:~]# networkctl status enp1s0f1
WARNING: systemd-networkd is not running, output will be incomplete.
● 3: enp1s0f1
Link File: /nix/store/gaz60mpylxry2qskvw045h803lv5lil6-systemd-242/lib/systemd/network/99-default.link
Network File: n/a
Type: ether
State: n/a (unmanaged)
Path: pci-0000:01:00.1
Driver: mlx5_core
Vendor: Mellanox Technologies
Model: MT27710 Family [ConnectX-4 Lx] (Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT)
HW Address: xx:yy:zz:c0:ef:35
@johnalotoski while networkd is not in use, the .link
files are honored by systemd. Can you point me to this systems' configuration, so I can see how the bonds are being set up?
Also cc @grahamc
Hi @flokli, the instances are provisioned by iPXE images from the packet-nixos repo used by packet.net. During provisioning, a nix configuration snippet for networking and bonding is generated which looks like the following, where the parameters in angle brackets are populated from the instance metadata:
{
networking.hostName = "<hostname>";
networking.dhcpcd.enable = false;
networking.defaultGateway = {
address = "<IPv4>";
interface = "bond0";
};
networking.defaultGateway6 = {
address = "<IPv6>";
interface = "bond0";
};
networking.nameservers = [
"<packetDnsIpv4>"
"<packetDnsIpv4>"
];
networking.bonds.bond0 = {
driverOptions = {
mode = "802.3ad";
xmit_hash_policy = "layer3+4";
lacp_rate = "fast";
downdelay = "200";
miimon = "100";
updelay = "200";
};
interfaces = [
"enp1s0f0" "enp1s0f1"
];
};
networking.interfaces.bond0 = {
useDHCP = false;
ipv4 = {
routes = [
{
address = "10.0.0.0";
prefixLength = 8;
via = "<IPv4>";
}
];
addresses = [
{
address = "<pubIPv4>";
prefixLength = <pubCIDR>;
}
{
address = "<privIPv4>";
prefixLength = <privCIDR>;
}
];
};
ipv6 = {
addresses = [
{
address = "<IPv6>";
prefixLength = <CIDR>;
}
];
};
};
}
We are using a nixops packet plugin to pass this nix networking snippet (the same nix snippet as that generated by the packet provisioning script) with appropriate metadata populated for use in nixops deployments. See, for example, the c2.medium.x86 nix network configuration used by the packet nixops plugin.
@johnalotoski Can you tell me which systemd services are running during those 15-30min? Maybe a process tree could also be helpful? (systemctl status > file
might be able to provide both at the same time.)
There is one while …; do sleep 0.1; done
snippet in the scripted networking that could be spinning during that period of time.
Hi @andir, here is an attachment that contains a sequence of commands taken during the outage (arp-incomplete) and after normal networking resumed (arp-complete). There is also a diff between them. Each of the arp complete and incomplete files captured the following output:
systemctl list-units --all
systemctl status
ps -ejH
ps axjf
pstree -w -l 100
This was taken from a machine deployed at the commit in question. comparison.zip
A few other diagnosticcs that might help:
systemd-analyze blame > blame.txt
systemd-analyze critical-chain > critical-chain.txt
systemd-analyze plot > plot.svg
Especially the last one will show you a detailed account of the system starting up and where it might be stuck. (Though dunno if this will be of much help, just thought it might be useful)
Hi @andir, here is an attachment that contains a sequence of commands taken during the outage (arp-incomplete) and after normal networking resumed (arp-complete). There is also a diff between them. Each of the arp complete and incomplete files captured the following output:
systemctl list-units --all systemctl status ps -ejH ps axjf pstree -w -l 100
This was taken from a machine deployed at the commit in question. comparison.zip
Thanks! Could you also verify if ti happens on master and/or release-19.09
? We had another systemd bump there that might have already fixed it and I would like to avoid wasting time chasing old bugs.
Hi @andir, yes, this happens on master and I believe on release 19-09 also. I don't blame you for not wanting to waste time; ditto. @arianvp, thanks for the tip! I've included the output of those commands in the diagnostic zip file below. This seemed like a good opportunity to use asciinema to convey the issue in a more tangible way. I recorded and have included below two console sessions of the problem taken in parallel which illustrate the issue: one console session from the nixops deployer side and one console session from the server where some debugging is done both during the network outage and after the network outage. Diagnostic/debug files collected during those videos are attached in the diagnostic zip file below. The reverse patch of the commit in question is applied to the head of master nixpkgs and shown to resolve the issue. The asciinema videos have a maximum console idle time of 2 seconds to keep the time of the video short.
Asciinema video 1: Nixops bonding nic debugging deploy Asciinema video 2: Packet server bonding nic debugging session Files collected during the video: diagnostic.zip
For the files, the naming is:
I just confirmed that this issue is fixed on 19.09:
Host network:
networkctl status vz-nixos
● 8: vz-nixos
Link File: /nix/store/gg0ppshg45gksxsq2jbjbhvm3mk70vq9-systemd-243/lib/systemd/network/99-default.link
Network File: /nix/store/gg0ppshg45gksxsq2jbjbhvm3mk70vq9-systemd-243/lib/systemd/network/80-container-vz.network
Type: bridge
State: routable (configured)
Driver: bridge
HW Address: fa:4b:0c:87:73:6b
MTU: 1500 (min: 68, max: 65535)
Forward Delay: 15s
Hello Time: 2s
Max Age: 20s
Ageing Time: 5min
Priority: 32768
STP: no
Multicast IGMP Version: 2
Queue Length (Tx/Rx): 1/1
Address: 192.168.210.1
169.254.244.253
fe80::f84b:cff:fe87:736b
Two nspawn
containers (Created with nixos-instal
) both get an IP:
[root@arianvp:~]# machinectl list
MACHINE CLASS SERVICE OS VERSION ADDRESSES
test1 container systemd-nspawn nixos 20.03.git.0092f2e 192.168.210.32…
test2 container systemd-nspawn nixos 20.03.git.0092f2e 192.168.210.184…
Contained side gets configured correctly too:
[root@test1:~]# networkctl status -a
● 1: lo
Link File: n/a
Network File: n/a
Type: loopback
State: carrier (unmanaged)
MTU: 65536
Queue Length (Tx/Rx): 1/1
Address: 127.0.0.1
::1
● 2: host0
Link File: n/a
Network File: /nix/store/gg0ppshg45gksxsq2jbjbhvm3mk70vq9-systemd-243/lib/systemd/network/80-container-host0.network
Type: ether
State: routable (configured)
HW Address: 42:ef:e5:e2:77:59
MTU: 1500 (min: 68, max: 65535)
Queue Length (Tx/Rx): 1/1
Auto negotiation: no
Speed: 10Gbps
Duplex: full
Port: tp
Address: 192.168.210.32
169.254.22.221
fe80::40ef:e5ff:fee2:7759
Gateway: 192.168.210.1
Time Zone: Europe/Amsterdam
Connected To: test2 on port host0
arianvp.me on port vz-nixos
Play around with this yourself with my systemd-nspawn
module (Which I eventually want to use as a base for nixos-container
in 20.03)
https://github.com/arianvp/nixos-stuff/blob/master/modules/containers-v2.nix https://github.com/arianvp/nixos-stuff/blob/master/configs/arianvp.me/default.nix#L28-L35
Host network config: https://github.com/arianvp/nixos-stuff/blob/master/configs/arianvp.me/network.nix Container network config: https://github.com/arianvp/nixos-stuff/blob/master/modules/containers-v2.nix#L33-L36
@arianvp For completeness I also tested this with plain ve-
interfaces on the host with networkd:
● 29: ve-foo
Link File: /nix/store/ag67dibj50z39rw1sr39zjd0dx6zcf2d-systemd-243/lib/systemd/network/99-default.link
Network File: /nix/store/ag67dibj50z39rw1sr39zjd0dx6zcf2d-systemd-243/lib/systemd/network/80-container-ve.network
Type: ether
State: routable (configured)
Driver: veth
HW Address: 02:2d:19:52:30:a4
MTU: 1500 (min: 68, max: 65535)
Queue Length (Tx/Rx): 1/1
Auto negotiation: no
Speed: 10Gbps
Duplex: full
Port: tp
Address: 192.168.7.177
169.254.95.164
fe80::2d:19ff:fe52:30a4
Connected To: foo on port host0
Yet this does not solve this whole mess completely, in particular not @johnalotoski's problem which is clearly related.
@johnalotoski After reviewing your extensive debug logs (thanks a lot!), I'm hoping that we just have to ship the updated 80-net-setup-link.rules
to let udev do it's magic again without networkd enabled (which is exactly your case, not what @arianvp did above). I'm not yet sure what that magic might be and what exactly changed in udev.
I'll try to reproduce that in a NixOS test and will open a PR with the change for you to test. This is clearly something we have to fix for 19.09.
@arianvp This is BTW also somewhat related to our predictable ifnames in initrd fix where we had to resort to include all udev rules in the initrd (and 80-net-setup-link.rules
in particular). That commit is not on master yet: 7da962d31b9113f16161510909a66a397dad91fc.
the udev rule is included by default even if networkd is disabled. This is the same udev rule that enables our interface rename, which is working! If you enable debug logging on udev you'll see that it is indeed loaded during boot already. There's no need to include it.
Hi @fpletz, @arianvp, happy to test out any potential fixes against the packet.net infra, thanks much!
@fpletz you can debug that it is indeed running by doing this:
[root@arianvp:~]# udevadm -d test-builtin net_setup_link /sys/class/net/ens3
Trying to open "/etc/udev/hwdb.bin"...
=== trie on-disk ===
tool version: 243
file size: 8269771 bytes
header size 80 bytes
strings 2110315 bytes
nodes 6159376 bytes
Load module index
Found container virtualization none.
timestamp of '/etc/systemd/network' changed
Parsed configuration file /nix/store/gg0ppshg45gksxsq2jbjbhvm3mk70vq9-systemd-243/lib/systemd/network/99-default.link
Created link configuration context.
ID_NET_DRIVER=virtio_net
ens3: Config file /nix/store/gg0ppshg45gksxsq2jbjbhvm3mk70vq9-systemd-243/lib/systemd/network/99-default.link is applied
ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
ens3: Device has name_assign_type=4
Using default interface naming scheme 'v243'.
ens3: Policy *keep*: keeping existing userspace name
ens3: Device has addr_assign_type=0
ens3: MAC on the device already matches policy *persistent*
ID_NET_LINK_FILE=/nix/store/gg0ppshg45gksxsq2jbjbhvm3mk70vq9-systemd-243/lib/systemd/network/99-default.link
Unload module index
Unloaded link configuration context.
Can we close this issue and move the packet-network issue stuff to the new issue I opened for that specific issue?
Can we open a new issue for the packet-specific problem? Already moved to https://github.com/NixOS/nixpkgs/issues/69360. I think this it is mostly a documentation issue, (plus some follow-up fixes for packet), and we documentation issue was fixed in https://github.com/NixOS/nixpkgs/pull/71456.
Issue description
We still have custom rules https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/hardware/udev.nix#L120 that were copied from systemd at some point. However our rules are out-dated and break the network units from systemd-networkd that detect container interfaces (the new rules provide more information about the interface type). The fix would be to use the new rules when networkd is enabled and use the old rules otherwise. The new rules are depending on networkd to do the actual rename. An alternative would be to always rename interfaces with networkd instead. There is also a test
nixos/tests/predictable-interface-names.nix
that ensures we make all users happy.cc @arianvp @fpletz @flokli