Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent
http://azure.microsoft.com/
Apache License 2.0
542 stars 372 forks source link

Add support to NIC / ethX name customization in agent - consistent interface device naming #1877

Open alv000h opened 4 years ago

alv000h commented 4 years ago

Description After "recent" changes in linux kernel (more or less since udev is mainstream), interface order of ethernet NICs is not granted.

there was some quickfixes for this in early versions of RHEL5 and Debian (mostly based on MAC), but recently those fixes are been deprecated and now almost all people is using consistent interface device naming aka biosdevname (from dell)

Thats why is neccesary add additional configuration parameter in WALinuxAgent, for example nicname, nicprefix or something like that. (so you can choose nicprefix="nic" as prefix if using nic0, nic1 as interface name or can define nicprefix="en" if using en0, en1 and so on...)

Because of that, network Interfaces can swap interface number randomly between reboots.

for further information, you can refer to this document: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/consistent-network-interface-device-naming_configuring-and-managing-networking#disabling-consistent-interface-device-naming-during-the-installation_consistent-network-interface-device-naming

And detailed explanation, here: https://lwn.net/Articles/356900/

Distro and WALinuxAgent

alv000h commented 4 years ago

For additional info please refer above issue and optionally this document

Another one from redhat

Additional info on different schemas

alv000h commented 4 years ago

1750 is related to this issue too.

jpbuecken commented 4 years ago

Hello, I'm not a waaegent developer, but I'm a long time linux user. First of all, systemd consistent naming scheme is not the same as biosdevnames from dell. They are different implementations to reach the same goal. This is why you have two kernel command line parameter for grub ( net.ifnames and biosdevnames ). biosdevnames is based on the name that is provided by BIOS (see https://github.com/dell/biosdevname/blob/master/README : "therefore it's likely that this will only work well on architectures that provide such information in their BIOS." .

As your first source from redhat describe, systemd has fallback options and for example they are based on pci slot position. My favorite explanation is this: https://major.io/2015/08/21/understanding-systemds-predictable-network-device-names/ If not empty, systemd/udev chooses from

E: ID_NET_NAME_MAC=enxa0369f2cec90
E: ID_NET_NAME_PATH=enp8s0f0
E: ID_NET_NAME_SLOT=ens9f0

That said, just to add a prefix option will not enable consistent naming scheme. And if I understood you correctly, your goal is to have interfaces that does not swap between reboots. This is done by removing net.ifnames=0 and biosdevname=0 from grub command line and rebuild grub config (or set net.ifnames=1) (and reboot). The use of own prefixes (e.g. via net.ifnames.prefix) is not recommended by redhat (see your first source from redhat: "Red Hat does not support the use of prefixdevname on already deployed systems." and "However, Red Hat recommends to use the default naming scheme, which is the same as in Red Hat Enterprise Linux 7."

IMHO the provider of the image should decide which naming scheme is used. Maybe AZURE or the OS vendor provides guidelines for HyperV and that is the reason why net.ifnames=0 has been set in those images. You may clarify activation of systemd consistent naming scheme with your OS vendor.

jpbuecken commented 4 years ago

PS: You may try to add a udev rule as well:

KERNEL=="eth*", ACTION=="add", PROGRAM="/sbin/biosdevname -i %k", NAME="%c"

(install biosdevname binary first)

This will use udevdevnames from dell (https://github.com/dell/biosdevname/blob/master/README) (not systemd conistent naming scheme) But not sure if AZURE/HyperV BIOS provide useful information here.

alv000h commented 4 years ago

@jpbuecken your is solution is OK, but waagent apparently disable #1750 udev rules and remove network rules from /etc/udev/rules.d .

That is why this issue is open, we need an standard from microsoft, something similar to RHEL (ensXXX/enoXXX)

jpbuecken commented 4 years ago

The usage of systemd consistent naming scheme is independend of #1750 (As you may have seen, I opened #1750). It is about waagent uses a not supported way to disable net generator. But it is ok to disable it in general:

The udev net generator creates udev rules to hardcode interface names via MAC or kernel PCI slot. This is a predecessor of the systemd consistent naming scheme to archive the same goal (same name after reboots). It is important to not hardcode the interface via MAC. Otherwise you cannot use snapshots to restore your instance (this process creates new interfaces with new macs).

It is not needed to use the systemd consistent naming scheme! Actually, it will conflict with the systemd consistent naming scheme, because they are again two different concepts.

Just set net.infames=1, rebuild grub, adapt your ifcfg file (SLES, RHEL <=7), reboot

Then systemd will name your interface depending on information provided by the hypervisor:

My example is SLES15, gen1:

udevadm info -e | grep -A 9 ^P.*eth0
P: /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/000d3a24-24df-000d-3a24-24df000d3a24/net/eth0
E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/000d3a24-24df-000d-3a24-24df000d3a24/net/eth0
E: ID_NET_DRIVER=hv_netvsc
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: ID_NET_NAME_MAC=enx000d3a2424df
E: ID_OUI_FROM_DATABASE=Microsoft Corp.
E: ID_PATH=acpi-VMBUS:01
E: ID_PATH_TAG=acpi-VMBUS_01
E: IFINDEX=2
E: INTERFACE=eth0

Since there is no ID_NET_NAME_SLOT or ID_NET_NAME_PATH, my interface is called enx000d3a2424df after the steps above.

The problem is that Azure / HyperV Gen1 or the network driver/kernelmodule does not provide ID_NET_NAME_SLOT. There is nothing the waagent can do to activate or configure this. I did not test Gen2 yet. Anyway, IMHO this is more a case for Microsoft/Azure to provide the information via hypervisor/driver/kernel-modul.

Another idea: You can create your own udev rules to name your interfaces

alv000h commented 4 years ago

@jpbuecken I have created my own udev rules to avoid interface swinging (that are nic0, nic1, and so on...) but this dont work with waagent that is why I open the issue

You are correct, azure may not report data very well, and probably there is a problem with a module or hipervisor (I have vmware openstack and so on... and all of this products in ansible report hipervisor as vnware or openstack meanwhile azure shows "virtual machine" which is very odd and funny...)

I think those things should be fixed some day... (you know generate C patch, compile, test...), but by now I have a strong functionality miss that stop my migration and I think it can be workarounded with a quick and dirty WA

The point is with current configuration waagent simply dont work because of: waagent tries to disable udev net rules or worst try to delete it... waagent have ethX nic format hardcoded in the source (causing too much log and strange behavuor un azureRM, i. e. IP from vm cannot be determined, and more)

For me and people that are migrating from kvm or vmware... (with prefix style nic names: eth0, ens0, etc) adding a nic prefix option in waagent configuration make our day... and simplify the WA because eth0 is hardcoded

If they prefer implement a robust solution making interface name dynamic by using "ip link list" they make us very happy but we should understand that complete solution will take a lot of time and for me nic prefix is enough at the moment.

jpbuecken commented 4 years ago

@jpbuecken I have created my own udev rules to avoid interface swinging (that are nic0, nic1, and so on...) but this dont work with waagent that is why I open the issue

*waagent tries to disable udev net rules or worst try to delete it...

Did you name your own udev rules /etc/udev/rules.d/70-persistent-net.rules? Avoid that. You may call it /etc/udev/rules.d/71-my-own-persistent-net.rules

The agent does not disable udev net rules in general, only the predefined net genarator from the OS vendor.

@jpbuecken your is solution is OK, but waagent apparently disable #1750 udev rules and remove network rules from /etc/udev/rules.d . That is why this issue is open, we need an standard from microsoft, something similar to RHEL (ensXXX/enoXXX)

As i tried to explain above. The waagent is independent. If you follow my steps: 1) Set net.infames=1 in /etc/default/grub 2) rebuild grub 3) adapt your ifcfg file (SLES, RHEL <=7) 4) reboot

Then the system will use consistent naming scheme as explained by RHEL!

But on Azure, the fallback to ID_NET_NAME_MAC is used by udev. Important: I cannot recommend it, because restore of a snapshot may create interfaces with new MACs. Thus you cannot connect to an instance restored from a snapshot, since your ifcfg files does not match the new interface name, which is called with the new MAC.

You wrote you are familiar with openstack: Afaik there are images with cloud-init; or you can enable cloud-init: 1) Install cloud-init 2) Configure cloud-init 3) Enable cloud-init in waagent.conf Maybe you are able to reach your goal by cloud-init configuration in those images.

alv000h commented 4 years ago

As stated before, simply changing nic name have implications...

Have you tried this in azure (/etc/udev/rules.d/71-my-own-persistent-net.rules) without cloud-init?

waagent has harcoded ethX nic naming as you can see here in getInterfaceNameByMac function: https://github.com/Azure/WALinuxAgent/blob/11d0881cd01e1bc5ff4f918c33701b60274c6e40/bin/waagent2.0#L669

In our experience depending on agent version, network can be restarted and agent start to fill the logs with "interface not found errors" when nic name is changed.

jpbuecken commented 4 years ago

As stated before, simply changing nic name have implications...

Have you tried this in azure (/etc/udev/rules.d/71-my-own-persistent-net.rules) without cloud-init?

waagent has harcoded ethX nic naming as you can see here in getInterfaceNameByMac function: https://github.com/Azure/WALinuxAgent/blob/11d0881cd01e1bc5ff4f918c33701b60274c6e40/bin/waagent2.0#L669

In our experience depending on agent version, network can be restarted and agent start to fill the logs with "interface not found errors" when nic name is changed.

I have to admit, I did not check the logs. Only rebooted and logged in with a working network. But I have more worries about the usage of ifconfig. It is deprecated and newer linux vendors do not ship it as default anymore. Maybe if the developer need to touch the function anyway, they can get rid of the hardcoded eth requirement to support customized nic names or the consistent naming scheme.