canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 883 forks source link

IPoIB interface is not coming up during boot with NetworkManager using ifcfg/sysconfig files #3959

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1965660

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = 2022-03-28T15:47:00.232277+00:00
date_created = 2022-03-20T08:37:12.040899+00:00
date_fix_committed = None
date_fix_released = None
id = 1965660
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1965660
milestone = None
owner = etlvnvda
owner_name = Itai Levy
private = False
status = invalid
submitter = etlvnvda
submitter_name = Itai Levy
tags = []
duplicates = []

Launchpad user Itai Levy(etlvnvda) wrote on 2022-03-20T08:37:12.040899+00:00

I am trying to create in OpenStack setup a Centos8-Stream VM with IPoIB interfaces. image was created with cloud-init and dhcp-all-interfaces. And I see that the interface is not configured automatically to be a "connected" interface by NetworkManager during boot and therefore not pulling DHCP. Only when I run the command "mcli conn add type infiniband con-name ib0 ifname ib0" the interface becomes active and assigned with DHCP IP. When I use Centos7-based image (where network manager was not the default networking service) I dont have this issue.

cloud-init collect-logs output is attached.

Some more interesting outputs:

cat /etc/os-release

NAME="CentOS Stream" VERSION="8" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="8" PLATFORM_ID="platform:el8" PRETTY_NAME="CentOS Stream 8" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:8" HOME_URL="https://centos.org/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8" REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

ip link show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP mode DEFAULT group default qlen 256     link/infiniband 00:00:01:4a:fe:80:00:00:00:00:00:00:fa:16:3e:00:00:46:de:18 brd 00:ff:ff:ff:ff:12:40:1b:80:65:00:00:00:00:00:00:ff:ff:ff:ff

ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00     inet 127.0.0.1/8 scope host lo        valid_lft forever preferred_lft forever     inet6 ::1/128 scope host        valid_lft forever preferred_lft forever 2: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256     link/infiniband 00:00:01:4a:fe:80:00:00:00:00:00:00:fa:16:3e:00:00:46:de:18 brd 00:ff:ff:ff:ff:12:40:1b:80:65:00:00:00:00:00:00:ff:ff:ff:ff

cat /etc/sysconfig/network-scripts/ifcfg-ib0

Created by cloud-init on instance boot automatically, do not edit.

# BOOTPROTO=dhcp DEVICE=ib0 HWADDR=00:00:01:4a:fe:80:00:00:00:00:00:00:fa:16:3e:00:00:46:de:18 ONBOOT=yes TYPE=Ethernet USERCTL=no

ncmli dev show

bash: ncmli: command not found [root@localhost stack]# nmcli dev show GENERAL.DEVICE: ib0 GENERAL.TYPE: infiniband GENERAL.HWADDR: 00:00:01:4A:FE:80:00:00:00:00:00:00:FA:> GENERAL.MTU: 2044 GENERAL.STATE: 30 (disconnected) GENERAL.CONNECTION: -- GENERAL.CON-PATH: -- IP4.GATEWAY: -- IP6.GATEWAY: --

GENERAL.DEVICE: lo GENERAL.TYPE: loopback GENERAL.HWADDR: 00:00:00:00:00:00 GENERAL.MTU: 65536 GENERAL.STATE: 10 (unmanaged) GENERAL.CONNECTION: -- GENERAL.CON-PATH: -- IP4.ADDRESS[1]: 127.0.0.1/8 IP4.GATEWAY: -- IP6.ADDRESS[1]: ::1/128 IP6.GATEWAY: -- IP6.ROUTE[1]: dst = ::1/128, nh = ::, mt = 256

nmcli conn show

#

cat /var/log/messages | grep -i networkman

Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.1869] NetworkManager (version 1.36.0-0.3.el8) is starting... (for the first time) Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.1872] Read config: /etc/NetworkManager/NetworkManager.conf (etc: 00-main.conf) Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.1872] config: unknown key 'autoconnect-retries' in section [connection] of file '/etc/NetworkManager/conf.d/00-main.conf' Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.1887] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager" Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.1973] manager[0x561ccff17000]: monitoring kernel firmware directory '/lib/firmware'. Mar 20 08:11:16 localhost dbus-daemon[1106]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.7' (uid=0 pid=1239 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0") Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2811] hostname: hostname: using hostnamed Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2811] hostname: hostname changed from (none) to "localhost.localdomain" Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2814] dns-mgr[0x561ccfefa250]: init: dns=default,systemd-resolved rc-manager=symlink Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2851] Loaded device plugin: NMTeamFactory (/usr/lib64/NetworkManager/1.36.0-0.3.el8/libnm-device-plugin-team.so) Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2851] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2852] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2852] manager: Networking is enabled by state file Mar 20 08:11:16 localhost dbus-daemon[1106]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.7' (uid=0 pid=1239 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0") Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2863] settings: Loaded settings plugin: ifcfg-rh ("/usr/lib64/NetworkManager/1.36.0-0.3.el8/libnm-settings-plugin-ifcfg-rh.so") Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2863] settings: Loaded settings plugin: keyfile (internal) Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2888] ifcfg-rh: load[/etc/sysconfig/network-scripts/ifcfg-ib0]: failure to read file: 802-3-ethernet.mac-address: '00:00:01:4a:fe:80:00:00:00:00:00:00:fa:16:3e:00:00:46:de:18' is not a valid MAC address Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2939] dhcp-init: Using DHCP client 'dhclient' Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2939] device (lo): carrier: link connected Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2941] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1) Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2948] manager: (ib0): new InfiniBand device (/org/freedesktop/NetworkManager/Devices/2) Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.2950] device (ib0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.3759] device (ib0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed') Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.3796] device (ib0): carrier: link connected Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.3797] manager: startup complete Mar 20 08:11:16 localhost NetworkManager[1239]: [1647763876.8378] hostname: hostname changed from "localhost.localdomain" to "localhost" Mar 20 08:11:18 localhost NetworkManager[1239]: [1647763878.3225] audit: op="reload" arg="0" pid=2166 uid=0 result="success" Mar 20 08:11:18 localhost NetworkManager[1239]: [1647763878.3229] config: unknown key 'autoconnect-retries' in section [connection] of file '/etc/NetworkManager/conf.d/00-main.conf' Mar 20 08:11:18 localhost NetworkManager[1239]: [1647763878.3229] config: signal: SIGHUP (no changes from disk) Mar 20 08:11:18 localhost dbus-daemon[1106]: [system] Activating via systemd: service name='org.freedesktop.resolve1' unit='dbus-org.freedesktop.resolve1.service' requested by ':1.7' (uid=0 pid=1239 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0") Mar 20 08:11:18 localhost NetworkManager[1239]: [1647763878.3248] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found) Mar 20 08:11:28 localhost systemd[1]: NetworkManager-dispatcher.service: Succeeded.

+++++++++++++++++++++++++++++

nmcli conn add type infiniband con-name ib0 ifname ib0

[ 703.180806] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready [ 703.182712] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready Connection 'ib0' (be895007-5a28-446b-a866-09c02d953ae4) successfully added. [root@localhost stack]# [ 703.206044] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready

[root@localhost stack]# nmcli conn show NAME UUID TYPE DEVICE ib0 be895007-5a28-446b-a866-09c02d953ae4 infiniband ib0 [root@localhost stack]# [root@localhost stack]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00     inet 127.0.0.1/8 scope host lo        valid_lft forever preferred_lft forever     inet6 ::1/128 scope host        valid_lft forever preferred_lft forever 2: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 256     link/infiniband 00:00:01:4a:fe:80:00:00:00:00:00:00:fa:16:3e:00:00:46:de:18 brd 00:ff:ff:ff:ff:12:40:1b:80:65:00:00:00:00:00:00:ff:ff:ff:ff     inet 11.11.11.47/24 brd 11.11.11.255 scope global dynamic noprefixroute ib0        valid_lft 86385sec preferred_lft 86385sec     inet6 fe80::4dac:e4bb:72bf:55b7/64 scope link noprefixroute        valid_lft forever preferred_lft forever

[root@localhost stack]# cat /etc/sysconfig/network-scripts/ifcfg-ib0-1 CONNECTED_MODE=no TYPE=InfiniBand PROXY_METHOD=none BROWSER_ONLY=no BOOTPROTO=dhcp DEFROUTE=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_FAILURE_FATAL=no IPV6_ADDR_GEN_MODE=stable-privacy NAME=ib0 UUID=be895007-5a28-446b-a866-09c02d953ae4 DEVICE=ib0 ONBOOT=yes

ubuntu-server-builder commented 1 year ago

Launchpad user Itai Levy(etlvnvda) wrote on 2022-03-20T08:37:12.040899+00:00

Launchpad attachments: cloud-init collect-logs

ubuntu-server-builder commented 1 year ago

Launchpad user Neal Gompa(ngompa13) wrote on 2022-03-22T11:23:20.816813+00:00

This is not using the NetworkManager codepath, but the legacy ifcfg codepath (called sysconfig in cloud-init).

ubuntu-server-builder commented 1 year ago

Launchpad user Itai Levy(etlvnvda) wrote on 2022-03-22T12:51:34.686175+00:00

I see the same behaviour when I build an image using DIB_DHCP_NETWORK_MANAGER_AUTO=true

https://docs.openstack.org/diskimage-builder/latest/elements/dhcp-all-interfaces/

Description When NetworkManager is detected, and this is set to true the dhcp-all-interfaces service will not be installed. Only the NetworkManager configuration will be added. NetworkManager is quite capable to do automatic interface configuration. NetworkManager will by default try to auto-configure any interface with no configuration, it will use DHCP for IPv4 and Router Advertisements to decide how to initialize IPv6.

why cloud-init is still going to legacy ifcfg codepath? how do you suggest proceeding? how can I set the cloud.cfg to use the NetworkManager codepath (it can be helpful if you can provide config example, please remember that the interface name is not known, will need something global)

Thanks Itai

ubuntu-server-builder commented 1 year ago

Launchpad user Itai Levy(etlvnvda) wrote on 2022-03-22T13:10:32.755107+00:00

The following cloud.cfg seems to solve the issue (together with disk-image-builder dhcp-all-interfaces DIB_DHCP_NETWORK_MANAGER_AUTO):

network: config: disabled

ubuntu-server-builder commented 1 year ago

Launchpad user James Falcon(falcojr) wrote on 2022-03-28T15:46:49.873390+00:00

I'm going to close this because disabling the network configuration solves your issue. If that is incorrect, please set status back to New. The most recent release of cloud-init has proper NetworkManager support.