Open ubuntu-server-builder opened 1 year ago
Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-19T16:47:02.075713+00:00
Launchpad attachments: cloud-init.tar.gz
Launchpad user Scott Moser(smoser) wrote on 2019-12-19T16:59:32.984247+00:00
Hi,
The problem you're seeing here is a result of a failure to persist data between cloud-init's local stage and network stage.
Launchpad user Chad Smith(chad.smith) wrote on 2019-12-19T17:42:27.505059+00:00
Thank you for filing this bug and helping to improve cloud-init.
From your linked logs it looks like you have been running cloud-init 18.2 and 18.5 on CentOS. There have been a number of fixes that touched this and fixing the persisting of instance-data.json bug we see in your logs here, as well as some of the network rendering logic.
If possible please try installing our latest upstream release cloud-init v. 19.4 from a copr-repo that we update at
https://copr.fedorainfracloud.org/coprs/g/cloud-init/el-testing
Once you have installed cloud-init 19.4, please run "sudo cloud-init clean --logs --reboot" to clean the system and allow cloud-init to "run fresh" on the system to see if we are still exposed to this error.
Launchpad user Chad Smith(chad.smith) wrote on 2019-12-19T17:43:15.443819+00:00
Marking the cloud-init task incomplete, please mark it back to if you are able to confirm that this bug still exists on latest cloud-init 19.4
Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-20T08:35:51.862035+00:00
@smoser, Hi, I think it is not the issue. I tried to run the node with only 1 interface and the network is configured just fine. Even though I see a similar error in cloud init logs.
Logs for successful cloud-init with 1 interface only: http://paste.openstack.org/show/787763/
Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-20T09:03:46.043733+00:00
@chad.smith, Updated cloud-init to 19.4 and performed a clean reboot, the issue still exists. Attached is the log. Launchpad attachments: cloud-init.tar.gz
Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-20T11:28:10.986452+00:00
Hi, I found the issue. cloud-init fails to configure network interfaces if any of the interfaces are missing on the node at https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L677
IMO cloud-init should continue to configure other existing interfaces on the nodes and skip the non-existing interfaces. I have pushed a patch to fix this issue https://github.com/canonical/cloud-init/pull/122 Thanks!
Launchpad user Matt Riedemann(mriedem) wrote on 2019-12-20T14:11:35.973108+00:00
Marking invalid for nova since it sounds like this is a cloud-init issue.
Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-24T08:29:53.230561+00:00
Hi Chad, Scott,
Can you confirm this issue if valid or not?
Launchpad user Ryan Harper(raharper) wrote on 2020-01-06T15:13:11.144377+00:00
@Madhuri,
Do you have cloud-init logs from the multi-nic + infiniband device boot? The logs posted are some what confusing.
The journal, we can see errors with "eth0" and "ib0":
Dec 20 08:55:15.447754 opa-new-4.novalocal network[3506]: Bringing up interface eth0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device eth0 does not seem to be present, delaying initialization. Dec 20 08:55:15.449801 opa-new-4.novalocal /etc/sysconfig/network-scripts/ifup-eth[3691]: Device eth0 does not seem to be present, delaying initialization. Dec 20 08:55:15.451435 opa-new-4.novalocal network[3506]: [FAILED] Dec 20 08:55:15.810635 opa-new-4.novalocal network[3506]: Bringing up interface ib0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device ib0 does not seem to be present, delaying initialization. Dec 20 08:55:15.811909 opa-new-4.novalocal /etc/sysconfig/network-scripts/ifup-eth[3720]: Device ib0 does not seem to be present, delaying initialization. Dec 20 08:55:15.813392 opa-new-4.novalocal network[3506]: [FAILED]
However, the cloud-init log shows cloud-init only writing config for enp5s0f0,
Applying network configuration from fallback bringup=False: {'ethernets': {'enp5s0f0': {'set-name': 'enp5s0f0', 'match': {'macaddress': '00:1e:67:fe:d2:59'}, 'dhcp4': True}}, 'version': 2}
That makes me wonder if there are some existing configuration files already present in this image?
Would you be able to attach the network_data.json that was supplied, you can fetch this from the running instance via:
curl -s http://169.254.169.254/openstack/latest/network_data.json
Also, if you could capture /etc/sysconfig/network-scripts/ifcfg-*, we could see if additional files are causing conflicts with what cloud-init is generating.
Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2020-01-22T12:23:08.411012+00:00
Hi Ryan,
Thank you for your response. Please find the details below:
network_data.json: {"services": [], "networks": [{"network_id": "9f3f91e1-5926-4345-953b-14049d48f17e", "link": "tapb2e6093b-d6", "type": "ipv4_dhcp", "id": "network0"}, {"network_id": "843dc3a5-2ff6-4bb6-8594-cf0459ca344b", "link": "tapacc60427-fa", "type": "ipv4_dhcp", "id": "network1"}], "links": [{"vif_id": "b2e6093b-d6a9-4fb4-aa6d-f5a598b216c8", "type": "phy", "ethernet_mac_address": "00:11:75:67:1e:bf", "id": "tapb2e6093b-d6", "mtu": 1500}, {"vif_id": "acc60427-facd-4db3-bd2b-5bce4fdbd57c", "type": "phy", "ethernet_mac_address": "00:1e:67:ed:f2:64", "id": "tapacc60427-fa", "mtu": 1500}]}
Node has 3 ifcg files: [centos@opa-node latest]$ ls /etc/sysconfig/network-scripts/ifcfg-* /etc/sysconfig/network-scripts/ifcfg-enp3s0f0 /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-lo
This bug was originally filed in Launchpad as LP: #1857031
Launchpad details
Launchpad user Madhuri Kumari(madhuri-rai07) wrote on 2019-12-19T16:47:02.075713+00:00
cloud-init 18.5:
Node has 3 interfaces: -enp5s0f0 - not connected -enp5s0f1 - connected -ib0 - an HFI port
Centos7.6 running on the node.
Openstack boots the server with two interfaces enp5s0f1 and ib0 and it is successful but the node is not reachable. On the node, the cloud-init configures the wrong interface enp5s0f0. It is because cloud-init fails to configure network interfaces running with OpenStack cloud if any of the network interfaces don't exist on the node. In this case, ib0 was missing.
Please note that when I try to boot the server with only 1 interface enp5s0f1, everything works fine and the node is reachable too.
Logs: http://paste.openstack.org/show/787707/ network-data and nics: http://paste.openstack.org/show/787797/ (note that enp5s0f1 is manually configured)