canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 881 forks source link

rename attempt of already renamed device #5075

Open rjschwei opened 7 months ago

rjschwei commented 7 months ago

Bug report

When cloud-init is instructed to rename an interface it should check if the name is already being used or not before trying to attempt to rename a network device.

We also have to decide what the behavior should be when the name is detected to exist.

Steps to reproduce the problem

There is an inherent race condition between cloud-init and udev that occurs when the system is being rebooted before cloud-init is completely finished. In a case where a network interface gets assigned a name by the user through the network configuration cloud-init will write this information into the udev rules file (persistent-net-cloud-init.rules) generated during the init-local phase. If after cloud-init local is complete the system gets rebooted and the same network config is provided by the user cloud-init will start again, as usual, but on this boot the udev rules file also exists and as such udev will run and process the rules in the previously generated persistent-net-cloud-init.rules file. While the file itself will get overwritten during the new run udev will have already read the old file and process it. At this point it is just a matter of time that udev will create the renamed device setup and by the time cloud-init comes around the rename will have already happened and cloud-init will generate an error:

""" Command: ['ip', 'link', 'set', 'enp55s1f0', 'name', 'ccd_ban'] Exit code: 2 Reason: - Stdout: Stderr: RTNETLINK answers: File exists """

As a result the interface remains down.

Environment details

cloud-init logs

cloud-init starts at 2024-02-08 05:54:17,618 then the network devices are enumerated starting at 2024-02-08 05:54:17,790 and ending at 2024-02-08 05:54:17,829

Then network devices are enumerated again starting at 2024-02-08 05:54:17,838 and ending at 2024-02-08 05:54:17,843. This is pretty much at the end of the clod-init-local phase. After all this we write out /etc/udev/rules.d/85-persistent-net-cloud-init.rules

The udev rules are written at

2024-02-08 05:54:17,846

Then at 2024-02-08 05:55:00,230 cloud-init starts it's "init" phase with net device configuration starting again at 2024-02-08 05:55:00,276 and that ends at 2024-02-08 05:55:00,292

Then at 2024-02-08 05:55:03,848 the final phase of cloud-init starts and at 2024-02-08 05:56:08,100 cloud-init gets killed. -> Reboot is triggered

After the system reboot cloud-init starts again at 2024-02-08 05:58:36,710 and the network device enumeration starts at 2024-02-08 05:58:36,950 and ends at 2024-02-08 05:58:37,009 close to the end of the second local phase. THe second "init" phase starts at 2024-02-08 05:58:45,541 with the network device enumertaion in init starting at 2024-02-08 05:58:45,589, only now udev has caught up and during this enumeration cycle we see

2024-02-08 05:58:45,590 - util.py[DEBUG]: Reading from /sys/class/net/ccd_ban/address

TheRealFalcon commented 7 months ago

@rjschwei , can you attach the results of cloud-init collect-logs to this bug?

rjschwei commented 7 months ago

cloud-init.tar.gz

blackboxsw commented 7 months ago

It may be irrelevant due to the timestamp of where the script is called is after cloud-init's failure to rename a device that already exists on the most recent boot, but what's also a bit confusing here to me is the journalctl entry Feb 08 06:59:06.520604 pcsstag01-mn002 cloud-init[5205]: + python3 /node-files/bond_workaround.py /etc/udev/rules.d/85-persistent-net-cloud-init.rules.

Given that our captured journalctl logs only represent the most recent boot, I can't see whether that script was run on the previous boot.

Do we know what bond_workaround.py does and can we also confirm the contents of /etc/udev/rules.d/85-persistent-net-cloud-init.rules remain unchanged by this bond_workaround.py script? I want to make sure we aren't also interacting with another script that's manipulating cloud-init's udev rules. I'd expect given the network config the contents of 85-persistent-net-cloud-init.rules to be something like the following even after this bond_workaround.py script is called:

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="72:58:28:f0:03:55", NAME="enp55s1f0"
blackboxsw commented 7 months ago

The second variable at play here seems to also be the network-manager-fix-ovs-interfaces.py script triggered by the systemd unit with the Description=Service that will assign mac address for the sriov vf as it looks to also be triggering interface up/down events that could exacerbate whatever race we think we are hitting

Feb 08 06:58:45.144226 pcsstag01-mn002 network-manager-fix-ovs-interfaces.py[3744]: Checking for ovs-interfaces which are in error state
Feb 08 06:58:45.144226 pcsstag01-mn002 network-manager-fix-ovs-interfaces.py[3744]: Chekcking if  the NM startup complete
Feb 08 06:58:45.144226 pcsstag01-mn002 network-manager-fix-ovs-interfaces.py[3744]: cmd = nmcli connection | grep 'ovs-interface\|ethernet' | grep -e "--" | awk '{print $1}'
Feb 08 06:58:45.144226 pcsstag01-mn002 network-manager-fix-ovs-interfaces.py[3744]: faulty device =
Feb 08 06:58:45.144226 pcsstag01-mn002 network-manager-fix-ovs-interfaces.py[3744]: All interfaces are in connected state, nothing to fix
Feb 08 06:58:45.145008 pcsstag01-mn002 bash[4187]: No VFs configured for device ens3f1
Feb 08 06:58:45.146377 pcsstag01-mn002 bash[4187]: No VFs configured for device ens3f4
Feb 08 06:58:45.147677 pcsstag01-mn002 bash[4187]: No VFs configured for device ens3f5
Feb 08 06:58:45.148980 pcsstag01-mn002 bash[4187]: No VFs configured for device ens3f6
Feb 08 06:58:45.150311 pcsstag01-mn002 bash[4187]: No VFs configured for device ens3f7
Feb 08 06:58:45.152964 pcsstag01-mn002 systemd[1]: Starting Initial cloud-init job (metadata service crawler)...
Feb 08 06:58:45.172659 pcsstag01-mn002 systemd[1]: network_manager_fix_ovs_interfaces.service: Deactivated successfully.
Feb 08 06:58:45.172848 pcsstag01-mn002 systemd[1]: Finished Check the ovs-interfaces that are in connecting state and trigger connection up to bring up the interfaces.
Feb 08 06:58:45.188536 pcsstag01-mn002 systemd[1]: assign_vf_mac_address.service: Deactivated successfully.
Feb 08 06:58:45.188713 pcsstag01-mn002 systemd[1]: Finished Service that will assign mac address for the sriov vf.
Feb 08 06:58:45.708085 pcsstag01-mn002 kernel: [qede_unload:3901(enp55s1f0)]Link is down
rjschwei commented 7 months ago

Do we know what bond_workaround.py does and can we also confirm the contents of /etc/udev/rules.d/85-persistent-net-cloud-init.rules remain unchanged by this bond_workaround.py script?

I am still waiting for the content of the udev rules and now have also requested the content of the workaround script.

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="72:58:28:f0:03:55", NAME="enp55s1f0"

Yes that is what I expect in the udev rules as well.

Thanks for picking up on the workaround script.

rjschwei commented 7 months ago

Have the workaround script:

#!/usr/bin/python3

import re, sys, shutil

def main(udev_rule_file):
  mac_list = []
  with open(udev_rule_file, "r") as readstream, open(udev_rule_file+'.cleaned', "w") as writestream:
    for line in readstream:
      mac = get_mac_address(line)
      if mac not in mac_list:
        mac_list.append(mac)
        writestream.write(line)

  shutil.move(udev_rule_file+'.cleaned', udev_rule_file)

 def get_mac_address(line):
  elements = line.split(",")
  for x in elements:
    if "address" in x :
      x = re.sub('[\"\{\}\*]', '', x)
      return x.split("==")[1]

 if __name__ == '__main__':
    main(sys.argv[1])

I don't recall seeing a bug report that we echoed devices with duplicate MAC addresses to the generated udev files. But in any event this would not do any harm for in this case.

rjschwei commented 7 months ago

Confirmed the generated udev rule file contains:

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="72:58:28:f0:03:55", NAME="enp55s1f0"

rjschwei commented 7 months ago

@TheRealFalcon @blackboxsw after creating #5086 and retesting I have a little better orientation around the code flow, but am still missing all the bits to put the puzzle together. From the test wit the patch I have extracted the following form the log:

2024-03-25 16:51:39,910 - stages.py[INFO]: Applying network configuration from fallback bringup=False:

2024-03-25 16:51:39,919 - sysconfig.py[INFO]: Generated udev rules /etc/udev/rules.d/85-persistent-net-cloud-init.rules already exist, skippingfurther network setup processing.

2024-03-25 16:51:48,443 - main.py[DEBUG]: [net] init will now be targeting instance id: b13c1cd7-8bec-4e65-9fa8-6cfe71987c76. new=False

2024-03-25 16:51:48,480 - stages.py[DEBUG]: No network config applied. Neither a new instance nor datasource network update allowed

2024-03-25 16:51:48,502 - __init__.py[DEBUG]: achieving renaming of [['72:58:28:f0:03:55', 'ccd_ban', None, None]] with ops

The

No network config applied. Neither a new instance nor datasource network update allowed

was also in the previous log and I missed it, meaning #5086 is in fact useless. That said the mystery is around https://github.com/canonical/cloud-init/blob/main/cloudinit/stages.py#L1040. We have decided that no network configuration should be applied, the message at 2024-03-25 16:51:48,480, and then we rename devices and return from apply_network_config() which to the best I can reconstruct at this part of the process was called from https://github.com/canonical/cloud-init/blob/main/cloudinit/cmd/main.py#L435 and this is where I loose track of the code flow. After line 435 in mai.py I do not understand how we end up in cloudinit/net/__init__.py where another rename attempt is made., i.e. the message at 2024-03-25 16:51:48,502 .

One would think that after https://github.com/canonical/cloud-init/blob/main/cloudinit/stages.py#L1040 we would not try another rename attempt. Butsince I cannot put the pieces together I cannot quite figure out how to stop it.

Help is appreciated.

TheRealFalcon commented 7 months ago

We have decided that no network configuration should be applied, the message at 2024-03-25 16:51:48,480, and then we rename devices and return from apply_network_config()

That final log is part of the rename devices you mention. As you say, update_event_enabled() returns False, which leads us to the no network config log. Calling into the next line, we should see another log that I'm not seeing in your paste. Did you strip it out?

After that we call this, which should lead to this, which should lead to this, which calls _rename_interfaces(), which leads to your line in question.

Does that help?

rjschwei commented 7 months ago

The applying net config names message exists and yes I missde that in the snipped in the previous comment

2024-03-25 16:51:48,480 - stages.py[DEBUG]: applying net config names for {'ethernets': {'ccd_ban': {'dhcp4': True, 'set-name': 'ccd_ban', 'match': {'macaddress': '72:58:28:f0:03:55'}}}, 'version': 2}

After this the enumeration of network devices follows. Which starts with

2024-03-25 16:51:48,480 - util.py[DEBUG]: Reading from /sys/class/net/ccd_ban/device/device (quiet=False)

So we are reading from a path that we are later trying to create and that's where the problem occurs, but I still do not understand why.

I have created a brute force avoidance approach

--- cloudinit/net/__init__.py.orig
+++ cloudinit/net/__init__.py
@@ -7,6 +7,7 @@

 import errno
 import functools
+import glob
 import ipaddress
 import logging
 import os
@@ -698,6 +699,12 @@ def _rename_interfaces(
     renames, strict_present=True, strict_busy=True, current_info=None
 ):

+    net_devs = glob.glob('/sys/class/net/*')
+    for dev in net_devs:
+        name = dev.split('/')[-1]
+        if name in renames:
+            del renames[renames.index(name)]
+
     if not len(renames):
         LOG.debug("no interfaces to rename")
         return

But this appears like a big hammer to plaster over the problem that should not be needed.

I also noticed

2024-03-25 16:51:48,493 - __init__.py[DEBUG]: Detected ccd_int as an OVS interface

This means when get_interfaces() is called filter_openvswitch_internal argument is True and maybe it shouldn't be?

But I do not know what the criteria shold as to when this should be True or False

rjschwei commented 5 months ago

And the fixed brute force approach ;)

+++ cloudinit/net/__init__.py
@@ -7,6 +7,7 @@

 import errno
 import functools
+import glob
 import ipaddress
 import logging
 import os
@@ -698,6 +699,12 @@ def _rename_interfaces(
     renames, strict_present=True, strict_busy=True, current_info=None
 ):

+    net_devs = glob.glob('/sys/class/net/*')
+    for dev in net_devs:
+        dev_name = dev.split('/')[-1]
+        for dev_data in renames:
+            if dev_name in dev_data:
+                del renames[renames.index(dev_data)]
+
     if not len(renames):
         LOG.debug("no interfaces to rename")
         return