canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.87k stars 856 forks source link

cloud-init fails to configure network interfaces running with OpenStack cloud #3523

Open ubuntu-server-builder opened 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1857031

Launchpad details
affected_projects = ['nova']
assignee = madhuri-rai07
assignee_name = Madhuri Kumari
date_closed = None
date_created = 2019-12-19T16:47:02.075713+00:00
date_fix_committed = 2019-12-19T17:42:35.350378+00:00
date_fix_released = 2019-12-19T17:42:35.350378+00:00
id = 1857031
importance = undecided
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1857031
milestone = None
owner = madhuri-rai07
owner_name = Madhuri Kumari
private = False
status = incomplete
submitter = madhuri-rai07
submitter_name = Madhuri Kumari
tags = []
duplicates = []

Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-19T16:47:02.075713+00:00

cloud-init 18.5:

Node has 3 interfaces:  -enp5s0f0 - not connected  -enp5s0f1 - connected  -ib0 - an HFI port

Centos7.6 running on the node.

Openstack boots the server with two interfaces enp5s0f1 and ib0 and it is successful but the node is not reachable. On the node, the cloud-init configures the wrong interface enp5s0f0. It is because cloud-init fails to configure network interfaces running with OpenStack cloud if any of the network interfaces don't exist on the node. In this case, ib0 was missing.

Please note that when I try to boot the server with only 1 interface enp5s0f1, everything works fine and the node is reachable too.

Logs: http://paste.openstack.org/show/787707/ network-data and nics: http://paste.openstack.org/show/787797/ (note that enp5s0f1 is manually configured)

ubuntu-server-builder commented 1 year ago

Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-19T16:47:02.075713+00:00

Launchpad attachments: cloud-init.tar.gz

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2019-12-19T16:59:32.984247+00:00

Hi,

The problem you're seeing here is a result of a failure to persist data between cloud-init's local stage and network stage.

https://bugs.launchpad.net/cloud-init/+bug/1801364

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2019-12-19T17:42:27.505059+00:00

Thank you for filing this bug and helping to improve cloud-init.

From your linked logs it looks like you have been running cloud-init 18.2 and 18.5 on CentOS. There have been a number of fixes that touched this and fixing the persisting of instance-data.json bug we see in your logs here, as well as some of the network rendering logic.

If possible please try installing our latest upstream release cloud-init v. 19.4 from a copr-repo that we update at

https://copr.fedorainfracloud.org/coprs/g/cloud-init/el-testing

Once you have installed cloud-init 19.4, please run "sudo cloud-init clean --logs --reboot" to clean the system and allow cloud-init to "run fresh" on the system to see if we are still exposed to this error.

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2019-12-19T17:43:15.443819+00:00

Marking the cloud-init task incomplete, please mark it back to if you are able to confirm that this bug still exists on latest cloud-init 19.4

ubuntu-server-builder commented 1 year ago

Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-20T08:35:51.862035+00:00

@smoser, Hi, I think it is not the issue. I tried to run the node with only 1 interface and the network is configured just fine. Even though I see a similar error in cloud init logs.

Logs for successful cloud-init with 1 interface only: http://paste.openstack.org/show/787763/

ubuntu-server-builder commented 1 year ago

Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-20T09:03:46.043733+00:00

@chad.smith, Updated cloud-init to 19.4 and performed a clean reboot, the issue still exists. Attached is the log. Launchpad attachments: cloud-init.tar.gz

ubuntu-server-builder commented 1 year ago

Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-20T11:28:10.986452+00:00

Hi, I found the issue. cloud-init fails to configure network interfaces if any of the interfaces are missing on the node at https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L677

IMO cloud-init should continue to configure other existing interfaces on the nodes and skip the non-existing interfaces. I have pushed a patch to fix this issue https://github.com/canonical/cloud-init/pull/122 Thanks!

ubuntu-server-builder commented 1 year ago

Launchpad user Matt Riedemann(mriedem) wrote on 2019-12-20T14:11:35.973108+00:00

Marking invalid for nova since it sounds like this is a cloud-init issue.

ubuntu-server-builder commented 1 year ago

Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-24T08:29:53.230561+00:00

Hi Chad, Scott,

Can you confirm this issue if valid or not?

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2020-01-06T15:13:11.144377+00:00

@Madhuri,

Do you have cloud-init logs from the multi-nic + infiniband device boot? The logs posted are some what confusing.

The journal, we can see errors with "eth0" and "ib0":

Dec 20 08:55:15.447754 opa-new-4.novalocal network[3506]: Bringing up interface eth0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device eth0 does not seem to be present, delaying initialization. Dec 20 08:55:15.449801 opa-new-4.novalocal /etc/sysconfig/network-scripts/ifup-eth[3691]: Device eth0 does not seem to be present, delaying initialization. Dec 20 08:55:15.451435 opa-new-4.novalocal network[3506]: [FAILED] Dec 20 08:55:15.810635 opa-new-4.novalocal network[3506]: Bringing up interface ib0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device ib0 does not seem to be present, delaying initialization. Dec 20 08:55:15.811909 opa-new-4.novalocal /etc/sysconfig/network-scripts/ifup-eth[3720]: Device ib0 does not seem to be present, delaying initialization. Dec 20 08:55:15.813392 opa-new-4.novalocal network[3506]: [FAILED]

However, the cloud-init log shows cloud-init only writing config for enp5s0f0,

Applying network configuration from fallback bringup=False: {'ethernets': {'enp5s0f0': {'set-name': 'enp5s0f0', 'match': {'macaddress': '00:1e:67:fe:d2:59'}, 'dhcp4': True}}, 'version': 2}

That makes me wonder if there are some existing configuration files already present in this image?

Would you be able to attach the network_data.json that was supplied, you can fetch this from the running instance via:

curl -s http://169.254.169.254/openstack/latest/network_data.json

Also, if you could capture /etc/sysconfig/network-scripts/ifcfg-*, we could see if additional files are causing conflicts with what cloud-init is generating.

ubuntu-server-builder commented 1 year ago

Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2020-01-22T12:23:08.411012+00:00

Hi Ryan,

Thank you for your response. Please find the details below:

network_data.json: {"services": [], "networks": [{"network_id": "9f3f91e1-5926-4345-953b-14049d48f17e", "link": "tapb2e6093b-d6", "type": "ipv4_dhcp", "id": "network0"}, {"network_id": "843dc3a5-2ff6-4bb6-8594-cf0459ca344b", "link": "tapacc60427-fa", "type": "ipv4_dhcp", "id": "network1"}], "links": [{"vif_id": "b2e6093b-d6a9-4fb4-aa6d-f5a598b216c8", "type": "phy", "ethernet_mac_address": "00:11:75:67:1e:bf", "id": "tapb2e6093b-d6", "mtu": 1500}, {"vif_id": "acc60427-facd-4db3-bd2b-5bce4fdbd57c", "type": "phy", "ethernet_mac_address": "00:1e:67:ed:f2:64", "id": "tapacc60427-fa", "mtu": 1500}]}

Node has 3 ifcg files: [centos@opa-node latest]$ ls /etc/sysconfig/network-scripts/ifcfg-* /etc/sysconfig/network-scripts/ifcfg-enp3s0f0 /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-lo