Open ubuntu-server-builder opened 1 year ago
Launchpad user Brett Holman(holmanb) wrote on 2022-11-16T17:15:59.577005+00:00
Launchpad attachments: failure on detection
Launchpad user Trent Lloyd(lathiat) wrote on 2023-03-22T03:54:10.747950+00:00
I ran into this issue when doing SR-IOV Bonding on OpenStack. We can assign two VFs with the same MAC. An example of doing that is here: https://www.redpill-linpro.com/techblog/2021/01/30/bonding-sriov-nics-with-openstack.html
While you can use unique MACs and use fail-over-mac-policy=active - then your metadata+DHCP breaks when using the slave interface. So it's ideal to have a duplicate as an option.
We keep running into this in various scenarios and already have multiple workarounds: OVS bridge duplicates: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1912844 Azure advanced networking: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1844191 Oracle net_failover: https://github.com/canonical/cloud-init/commit/fa47d527a03a00319936323f0a857fbecafceaf7
In most cases the real use case for this is some kind of VF-to-virtio failover for live migration or bonding (such is the case for both oracle net_failover and azure). Sometimes it's because a bridge, bond or OVS duplicates/steals a MAC - we also have special case code for handling that.
Currently when you hit this, cloud-init errors out and attempts no network configuration.
It would be ideal for cloud-init to make an attempt to configure the network with one of the interfaces - perhaps the one that already has the correct name or with some kind of priority that may have specifics for each driver type we already have exceptions for (ignore ovs/bridge/bond, prioritise the correct net_failover device, etc).
Ran into this just now after deploying kubernetes, using calico CNI, duplicate MAC's on calico interfaces after deployment of the helm chart and a reboot
Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/cloudinit/cmd/main.py", line 761, in status_wrapper ret = functor(name, args) File "/usr/lib/python3.9/site-packages/cloudinit/cmd/main.py", line 433, in main_init init.apply_network_config(bring_up=bring_up_interfaces) File "/usr/lib/python3.9/site-packages/cloudinit/stages.py", line 909, in apply_network_config self.distro.networking.wait_for_physdevs(netcfg) File "/usr/lib/python3.9/site-packages/cloudinit/distros/networking.py", line 147, in wait_for_physdevs present_macs = self.get_interfaces_by_mac().keys() File "/usr/lib/python3.9/site-packages/cloudinit/distros/networking.py", line 74, in get_interfaces_by_mac return net.get_interfaces_by_mac( File "/usr/lib/python3.9/site-packages/cloudinit/net/init.py", line 870, in get_interfaces_by_mac return get_interfaces_by_mac_on_linux( File "/usr/lib/python3.9/site-packages/cloudinit/net/init.py", line 944, in get_interfaces_by_mac_on_linux raise RuntimeError( RuntimeError: duplicate mac found! both 'cali9a68072be50' and 'calib855784d906' have mac 'ee:ee:ee:ee:ee:ee'
version : /bin/cloud-init 22.1-10.el9_2.alma
this is the latest possible alma cloud image.
My thoughts would be to allow some kind of configuration property that allows specifying regex for whitelist or blacklist of network address scope of operation. We often use cali* to control certain behaviours for example with iptables. This may materialise in a few situations with containerised workloads.
This bug was originally filed in Launchpad as LP: #1996789
Launchpad details
Launchpad user Brett Holman(holmanb) wrote on 2022-11-16T17:15:59.577005+00:00
Currently when duplicate mac addresses are detected, cloud-init dies.
While duplicate macs are typically corner cases, there are cases when they can be valid[1].
Consider this example[2]. After bonding two interfaces, the interfaces were left with duplicate mac addresses. Using cloud-init on this system fails at the time that these devices are detected.
If no network config is given, or if a config is given configuring a single address, we have the opportunity to do something intelligent to allow cloud-init to boot by using the "fallback interface" (in cloud-init this is the first interface), rather than throwing an exception and dying.
Netplan's mac matching assumes 1:1 mapping between mac addresses and interfaces, so in the case of multiple interfaces configured with matches, we still can't do anything intelligent.
[1] Until these have unique addresses, these interfaces will not be usable on the same broadcast domain, but they should still be able to work individually on different networks. [2] https://stackoverflow.com/questions/74459180/deleted-bond-interface-left-me-with-duplicate-mac-on-two-interfaces