Closed janse180 closed 6 years ago
Thanks for the report. Can you please try the following steps in order:
systemctl restart systemd-networkd
then wait a bit for DHCP to eventually setnetworkctl list
and networkctl status
journalctl -u systemd-networkd
For hints on how to configure DHCP via systemd-networkd, you can look at https://coreos.com/os/docs/latest/network-config-with-networkd.html and further docs linked at the bottom of that.
Started from a fresh copy of the stable ISO. Same hardware as above. Waited 5m after restarting networkd before running networkctl and journalctl commands. The interface seems to be stuck and not reqeusting DHCP, I don't see anything in journalctl that indicates an error. If I watch TCPdump on the same network ib0 is connected to I do not see any DHCP requests being sent.
$ modprobe mlx4_ib
$ modprobe ib_ipoib
$ ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:26:6c:f0:fa:fc brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether 00:26:6c:f0:fa:fd brd ff:ff:ff:ff:ff:ff
4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc fq_codel state UP mode DEFAULT group default qlen 256
link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:00:24:e8:90:97:ff:5b:b9 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
5: ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc fq_codel state DOWN mode DEFAULT group default qlen 256
link/infiniband 80:00:02:09:fe:80:00:00:00:00:00:00:00:24:e8:90:97:ff:5b:ba brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
$ cat /etc/systemd/network/20-dhcp.network
[Match]
Name=ib0
[Network]
DHCP=yes
$ systemctl restart systemd-networkd
$ networkctl list
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback carrier configured
2 eno1 ether routable configured
3 eno2 ether no-carrier configuring
4 ib0 infiniband degraded configuring
5 ib1 infiniband no-carrier configuring
5 links listed.
$ networkctl status
● State: routable
Address: 128.101.54.14 on eno1
fe80::226:6cff:fef0:fafc on eno1
fe80::224:e890:97ff:5bb9 on ib0
Gateway: 128.101.54.254 (CISCO SYSTEMS, INC.) on eno1
$ networkctl status ib0
● 4: ib0
Link File: /usr/lib64/systemd/network/99-default.link
Network File: /etc/systemd/network/20-dhcp.network
Type: infiniband
State: degraded (configuring)
Path: pci-0000:02:00.0
Driver: ib_ipoib
Vendor: Mellanox Technologies
Model: MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
HW Address: 80:00:02:08:fe:80
Address: fe80::224:e890:97ff:5bb9
Nov 30 20:22:01 localhost systemd[1]: Stopping Network Service...
Nov 30 20:22:01 localhost systemd[1]: Stopped Network Service.
Nov 30 20:22:01 localhost systemd[1]: Starting Network Service...
Nov 30 20:22:01 localhost systemd-networkd[1219]: ib0: Gained IPv6LL
Nov 30 20:22:01 localhost systemd-networkd[1219]: eno1: Gained IPv6LL
Nov 30 20:22:01 localhost systemd-networkd[1219]: Enumeration completed
Nov 30 20:22:01 localhost systemd[1]: Started Network Service.
Nov 30 20:22:01 localhost systemd-networkd[1219]: lo: Configured
Nov 30 20:22:13 localhost systemd-networkd[1219]: eno1: Configured
@janse180 note that restarting networkd after writing a /etc/systemd/network/
file isn't necessarily sufficient for it to apply all types of changes, similar to the issue discussed here.
If you're able to either create that network
file as part of provisioning (so it will be there early enough) or have a persistent rootfs so you can do a full reboot, that's a more reliable way to configure an interface.
@euank I am applying that systemd network file using CL-Config from my matchbox server. The above commands were just for debugging clarification. I believe this issue is related to dhcpcd and its handling of infiniband cards.
ah, thanks for clarifying, apologies for my misunderstanding
Thank you for reporting this issue. Unfortunately, we don't think we'll end up addressing it in Container Linux.
We're now working on Fedora CoreOS, the successor to Container Linux, and we expect most major development to occur there instead. Meanwhile, Container Linux will be fully maintained into 2020 but won't see many new features. We appreciate your taking the time to report this issue and we're sorry that we won't be able to address it.
Issue Report
Bug
An Infiniband IP over IB interface is unable to obtain configuration using DHCP. Manually running 'dhcpcd ib0' shows the error 'ib0: if_sendrawpacket: Invalid argument'. The interface never sends a DHCP request packet. The lack of working DHCP makes booting and managing a large cluster over infiniband difficult.
Container Linux Version
Environment
Bare Metal - Dell C6100 Mellanox ConnectX2 MT26428 Infiniband Card 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
Expected Behavior
Infiniband Interface receives ip address and configuration from DHCP server.
Actual Behavior
Dhcpcd returns an error 'ib0: if_sendrawpacket: Invalid argument'. No DHCP packet is sent over the interface.
Reproduction Steps
Other Information
This issue is present on both the alpha and the beta channels. The version of DHCPCD is 6.10.1 on the alpha, beta and stable channels. The infiniband interface works without issue if a static IP assigned to it using systemd-networkd.