canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.74k stars 832 forks source link

Cloud_init honoring lower IP as a primary IP instead of secondary(multi IP) #4157

Open bagadesvb opened 1 year ago

bagadesvb commented 1 year ago

I have been running into an issue while trying to use the new RHEL 8.4 image. Upon boot when the system assigns IPs to the NIC it uses the lowest IP as the main IP for the NIC with all the other ones as alias. This is incorrect behavior, it should assign the primary IP from the EC2 instance as main IP and the secondary IPs as alias instead.

In cloud-init.log file we can see that it retrieves all the EC2 data, and in /run/cloud-init/instance-data.json which contains the data retrieved for the EC2 instance it identifies 10.204.22.36 as the main IP:

as per cloud Init Primary Private : 10.204.22.36

Secondary Private : 10.204.22.32 10.204.22.33 10.204.22.30 10.204.22.40

However when we go to AWS Console and Networking it shows Primary Private IP : 10.204.22.30 Secondary IP : 10.204.22.32 10.204.22.33 10.204.22.3g 10.204.22.40

Bug report

Steps to reproduce the problem

Environment details

cloud-init logs

TheRealFalcon commented 1 year ago

Hi @bagadesvb , thanks for the bug report. Can you run cloud-init collect-logs on an affected instance and upload the logs here?

You mentioned it not working on RHEL 8.4 . Does it work for you on a different image? If so, what version(s) does it work for? Is it possible to also get logs from an instance that works as you expect?

xiachen-rh commented 1 month ago

I can reproduce the issue with rhel 8.6+ image on AWS. The issue is caused by NetworkManager behavior change for NetworkManager 1.36 onwards(https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/8.6_release_notes/index#BZ-2096256), and the change is intended, that is if both static IP and DHCP is configured, the static IP is now the primary, however now this behavior has a conflict with the way of cloud-init configuration multiple IPs.

Let's see the details of the conflict, Launch an EC2 instance on AWS with multiple IPs, for example, on AWS web console> Network settings>Advanced network configuration, set Primary IP 10.116.2.73 and set the secondary IPs 'Manually assign' with ip 10.116.2.70, and 10.116.2.71. After the instance running, login and run the command ‘ip a s’ and we can find that ‘10.116.2.70’ is shown as the primary IP, but it should be 10.116.2.73 what we have set.

eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000 link/ether 0a:0a:95:7c:e4:ff brd ff:ff:ff:ff:ff:ff altname enp0s5 altname ens5 inet 10.116.2.70/24 brd 10.116.2.255 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet 10.116.2.71/24 brd 10.116.2.255 scope global secondary noprefixroute eth0 valid_lft forever preferred_lft forever inet 10.116.2.73/24 brd 10.116.2.255 scope global secondary noprefixroute eth0 valid_lft forever preferred_lft forever inet6 fe80::80a:95ff:fe7c:e4ff/64 scope link valid_lft forever preferred_lft forever

the meta-data is $ cat /run/cloud-init/instance-data.json
... "local-hostname": "ip-10-116-2-73.us-west-2.compute.internal", "local-ipv4": "10.116.2.73", "mac": "0a:0a:95:7c:e4:ff", "metrics": { "vhostmd": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" }, "network": { "interfaces": { "macs": { "0a:0a:95:7c:e4:ff": { "device-number": "0", "interface-id": "eni-04ed39e0596481537", "ipv4-associations": { "34.208.183.64": "10.116.2.73" }, "local-hostname": "ip-10-116-2-73.us-west-2.compute.internal", "local-ipv4s": [ "10.116.2.73", "10.116.2.70", "10.116.2.71" ], "mac": "0a:0a:95:7c:e4:ff", "owner-id": "567014786890", "public-hostname": "ec2-34-208-183-64.us-west-2.compute.amazonaws.com", "public-ipv4s": "34.208.183.64", "security-group-ids": "sg-08097c862629b8dfb", "security-groups": "launch-wizard-687", "subnet-id": "subnet-01b238ab569faa3b2", "subnet-ipv4-cidr-block": "10.116.2.0/24", "vpc-id": "vpc-0661860a8ccc3f0be", "vpc-ipv4-cidr-block": "10.116.0.0/16", "vpc-ipv4-cidr-blocks": "10.116.0.0/16", "vpc-ipv6-cidr-blocks": [ "2600:1f14:5b3:ec00:0:0:0:0/56", "2600:1f14:2a83:4900:0:0:0:0/56" ] } } } },

the NetworkManager connection file is, sudo cat /etc/NetworkManager/system-connections/cloud-init-eth0.nmconnection

# Generated by cloud-init. Changes will be lost.

[connection]
id=cloud-init eth0
uuid=******
autoconnect-priority=120
type=ethernet

[user]
org.freedesktop.NetworkManager.origin=cloud-init

[ethernet]
mac-address=******

[ipv4]
method=auto
may-fail=false
address1=10.116.2.70/24
address2=10.116.2.71/24

From the cloud-init log we can see the logic of cloud-init is 1.init-local stage: Set up dhcp and received dhcp lease on eth0 for 10.116.2.73/255.255.255.0 2.init-local stage: Got meta-data via 169.254.169.254 3.init-local stage: deleted ip&route of eth0 4.configured eth0 with dhcp4 and 'addresses': ['10.116.2.70/24', '10.116.2.71/24']

Now the problem is the primary and secondary IPs are inconsistent with the expectation. The user said, from Apps side the impact is, -Secondary IP are used for virtual host or cluster IP -when application move or failover they can not connect using Pri IP , firewall not allowed to connect using secondary IP

We have provided a workaround to the user, that NetworkManager provides its automatic configuration service for cloud environments (package NetworkManager-cloud-setup), which configures all the addresses received from the metadata server as static preserving the order.

However we got that the current user experience is quite bad, so we wonder if it is possible that cloud-init could be changed. @TheRealFalcon Why cloud-init configure the first IP with dhcp4? Could it be configured as static ip too? A suggestion is, if cloud-init could configure all addresses as static in the NetworkManager profile, the order can be preserved. Is there any other suggestion to solve the problem?

xiachen-rh commented 1 month ago

Uploaded my test log cloud-init.log cloud-init_nm_multipleip.log