canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 882 forks source link

since ubuntu 18 bionic release and latest, the ubuntu18 cloud image is unable to boot up on openstack instance #3491

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1851552

Launchpad details
affected_projects = ['networking-calico', 'nova', 'openstack-community', 'qemu-kvm', 'cloud-init (Ubuntu)', 'qemu (Ubuntu)']
assignee = None
assignee_name = None
date_closed = 2020-01-14T18:28:55.424641+00:00
date_created = 2019-11-06T19:13:36.976049+00:00
date_fix_committed = None
date_fix_released = None
id = 1851552
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1851552
milestone = None
owner = vasili.namatov
owner_name = Vasili
private = False
status = invalid
submitter = vasili.namatov
submitter_name = Vasili
tags = ['metadata']
duplicates = []

Launchpad user Vasili(vasili.namatov) wrote on 2019-11-06T19:13:36.976049+00:00

Openstack Queens release which is running on ubuntu 18 LTS Controller and Compute. Tried to boot up the instance via horizon dashboard without success. Nova flow works perfect. When access to console I discovered that the boot process stuck in the middle. [[0;1;31m TIME [0m] Timed out waiting for device dev-vdb.device. [[0;1;33mDEPEND[0m] Dependency failed for /mnt. [[0;1;33mDEPEND[0m] Dependency failed for File System Check on /dev/vdb. It receives IP but looks like not get configured at time. since ubuntu 18 there is netplan feature managing the network interfaces please advise.

more details as follow: https://bugs.launchpad.net/networking-calico/+bug/1851548

ubuntu-server-builder commented 1 year ago

Launchpad user Paride Legovini(paride) wrote on 2019-11-08T17:02:34.276004+00:00

Hello,

Could you please elaborate a bit more on how you came to the conclusion that the problem is caused specifically by cloud-init? Without some more context information it's difficult for us to tell if this is actually a bug and to begin working on it.

If you think this is actually a problem with cloud-init, could you please run cloud-init collect-logs and attach the generated tarball to this bug report? The collected logs will help us understand what's going on.

I'm marking this report as Incomplete for the moment, please change its status back to New after providing additional information. Thanks!

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2019-11-09T05:32:01.717042+00:00

The instance creation was working until calico configured on controller and compute. Ubuntu 16 and Centos releases are booting up successfully. As known ubuntu began to work with netplan since 18 and all latest releases. Not sure the issue with cloud init or the order or timing of booting process which relates to getting IP in time and properly.

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2019-11-10T23:23:15.854944+00:00

the problem is in:

[[0;32m OK [0m] Started Wait for Network to be Configured.

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2019-11-12T09:42:06.333909+00:00

I used the following official latest ubuntu bionic image: http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img

And the regular openstack command: https://docs.openstack.org/mitaka/install-guide-ubuntu/launch-instance-provider.html

openstack server create --flavor ubuntu-flavor --image ubuntu-bionic-latest \ --nic net-id=c9d82a5d-e075-4d66-8ecd-1092fa218ad7 --security-group allow_all \ --key-name cloud-keypair.private ubuntu-bionic-instance

more details as follow: https://bugs.launchpad.net/networking-calico/+bug/1851548

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2019-11-12T11:06:11.155505+00:00

see full log https://etherpad.openstack.org/p/ubuntu-xenial-log for successful creation of instance with latest ubuntu-xenial cloud image http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img

see full log https://etherpad.openstack.org/p/ubuntu-bionic-log for successful creation of instance with latest ubuntu-bionic cloud image http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-disk1.img

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2019-11-12T11:09:41.838972+00:00

correction:

see full log https://etherpad.openstack.org/p/ubuntu-xenial-log for successful creation of instance with latest ubuntu-xenial cloud image http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-disk1.img

see full log https://etherpad.openstack.org/p/ubuntu-bionic-log for NOT SUCCESSFUL creation of instance with latest ubuntu-bionic cloud image http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2019-11-13T14:17:44.447925+00:00

Hi. Please attach the output of 'cloud-init collect-logs'. Ideally from the 18.04 instance, but the 16.04 instance would be fine if you're not able to get it from 18.04.

Then, set the status of this bug back to New.

thanks.

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2019-11-24T16:44:05.624526+00:00

see attached collected cloud init logs from ubuntu xenial..

Launchpad attachments: cloud-init collect-logs

ubuntu-server-builder commented 1 year ago

Launchpad user Dan Watkins(oddbloke) wrote on 2019-12-04T15:16:20.252749+00:00

Hi Vasili, unfortunately there isn't enough info in the 16.04 logs to help us work out what's going on with 18.04. Do you have any way of accessing an 18.04 instance (serial console, perhaps?) that would allow you to gather more data?

Moving this back to Incomplete for now, apologies for the round trips!

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-12-04T22:18:26.639623+00:00

[[0;1;33mDEPEND[0m] Dependency failed for File System Check on /dev/vdb.

Looking at the bionic log you posted, it never gets a /dev/vdb device. Can you confirm that the VM configuration on the compute node correctly was configured with an ephemeral block device?

Here we can see not all of the block devices expected are present...

[[0;32m OK [0m] Started udev Coldplug all Devices. [[0m[0;31m [0m] (1 of 3) A start job is running for���label-UEFI.device (19s / 1min 30s)[K[[0;1;31m[0m[0;31m*

Also, looking at the 16.04 boot, it looks like this is nested virtualization, I can see in the journal that the xenial kernel is hitting this one:

Nov 24 16:18:43.138019 ubuntu kernel: ------------[ cut here ]------------ Nov 24 16:18:43.138390 ubuntu kernel: WARNING: CPU: 0 PID: 0 at /build/linux-mU1Buo/linux-4.4.0/arch/x86/kernel/fpu/xstate.c:517 fpuinit_system_xstate+0x37e/0x764() Nov 24 16:18:43.138624 ubuntu kernel: XSAVE consistency problem, dumping leaves Nov 24 16:18:43.147521 ubuntu kernel: Modules linked in: Nov 24 16:18:43.147832 ubuntu kernel: Nov 24 16:18:43.148048 ubuntu kernel: CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.0-169-generic #198-Ubuntu Nov 24 16:18:43.148268 ubuntu kernel: 0000000000000086 a2c4204db3cb6ecb ffffffff81e03d80 ffffffff8140c8e1 Nov 24 16:18:43.148582 ubuntu kernel: ffffffff81e03dc8 ffffffff81cb3c68 ffffffff81e03db8 ffffffff81086492 Nov 24 16:18:43.148788 ubuntu kernel: 0000000000000008 0000000000000440 0000000000000040 ffffffff81e03e4c Nov 24 16:18:43.149274 ubuntu kernel: Call Trace: Nov 24 16:18:43.149480 ubuntu kernel: [] dump_stack+0x63/0x82 Nov 24 16:18:43.149687 ubuntu kernel: [] warn_slowpath_common+0x82/0xc0 Nov 24 16:18:43.149892 ubuntu kernel: [] warn_slowpath_fmt+0x5c/0x80 Nov 24 16:18:43.150100 ubuntu kernel: [] ? xfeature_size+0x59/0x77 Nov 24 16:18:43.150343 ubuntu kernel: [] fpuinit_system_xstate+0x37e/0x764 Nov 24 16:18:43.150549 ubuntu kernel: [] ? early_idt_handler_array+0x120/0x120 Nov 24 16:18:43.150756 ubuntu kernel: [] fpu__init_system+0x1e7/0x28e Nov 24 16:18:43.159459 ubuntu kernel: [] early_cpu_init+0x2b6/0x2bb Nov 24 16:18:43.159715 ubuntu kernel: [] ? early_cpu_init+0x2b6/0x2bb Nov 24 16:18:43.160030 ubuntu kernel: [] setup_arch+0xc0/0xd1f Nov 24 16:18:43.160240 ubuntu kernel: [] ? early_idt_handler_array+0x120/0x120 Nov 24 16:18:43.160528 ubuntu kernel: [] start_kernel+0xe2/0x4a4 Nov 24 16:18:43.160737 ubuntu kernel: [] ? early_idt_handler_array+0x120/0x120 Nov 24 16:18:43.160926 ubuntu kernel: [] x86_64_start_reservations+0x2a/0x2c Nov 24 16:18:43.161133 ubuntu kernel: [] x86_64_start_kernel+0x14a/0x16d Nov 24 16:18:43.161624 ubuntu kernel: ---[ end trace 4d5ff9f2f68c4233 ]---

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829555

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2019-12-17T12:21:10.727976+00:00

Hi, Thanks for shared investigation details. According to the setup I use, could you assist to understand where and what I'm missing here that can lead to the issues you mentioned?

Working setup details

[1] The Openstack service installed on Controller and Compute with Ubuntu 18.04.2 With Minimal deployment for Queens:

Keystone Glance Nova Neutron Horizon

Reference: https://docs.openstack.org/install-guide/openstack-services.html#minimal-deployment-for-queens

All the configurations done according to the guide (reference I mentioned).

[2] The add-ons features included to Openstack are:

Calico driver plugin for Neutron Calico Bird for BGP Peering Calico Felix for Security Calico DHCP Agent instead Neutron DHCP

Reference: https://docs.projectcalico.org/v3.10/getting-started/openstack/

Thanks, Vasili

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2019-12-17T12:37:57.404497+00:00

Status changed to 'Confirmed' because the bug affects multiple users.

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2019-12-17T16:23:18.448754+00:00

Status changed to 'Confirmed' because the bug affects multiple users.

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-12-18T17:29:27.433313+00:00

For your Openstack deployment, are you running on baremetal? Are you deploying something like devstack or triple-o which enable nested virtualization?

https://docs.openstack.org/devstack/latest/guides/devstack-with-nested-kvm.html https://docs.openstack.org/tripleo-quickstart/latest/unprivileged.html https://tripleo-docs.readthedocs.io/en/latest/environments/virtual.html

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2019-12-22T15:28:06.576940+00:00

nope, no devstack nor tripleo.. everything straight forward as I mentioned previously.. installed all those services manually on controller and compute and connected to l3 switch with bgp of the bird..

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2019-12-22T15:38:29.551075+00:00

mariadb, rabbitmq, memcached, etcd, keystone, glance, neutron, nova, calico-driver, calico-felix, calico-dhcp, nova-api-metadata, bird

any clue on the issue?

Thanks, Vasili

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2020-01-09T00:19:47.707753+00:00

Unfortunately no; the kernel messages are very much related to nested virtualization, but I don't know where in your software stack it gets configured/enabled.

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2020-01-09T00:22:06.203969+00:00

I'm marking the cloud-init task invalid as at this time the logs point to a nested virtualization/openstack issue with devices not being present; not related to cloud-init. If further investigation points to an issue with cloud-init you can move the cloud-init task back to New.

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2020-01-09T06:34:33+00:00

There is no nested virtualization, all the openstack on bare metal with regular installation with regular services, the only thing is running is calico which is eliminate neutron ml2, metadata and dhcp and its running with calico plugin, calico-dhcp and calico felix. As well as on each compute nova-api-metadata is available.

How the devices can be presented, could you advise with further steps of investigation?

Best, Vasili

Sent from iPhone

On 9 Jan 2020, at 2:31, Ryan Harper 1851552@bugs.launchpad.net wrote:

I'm marking the cloud-init task invalid as at this time the logs point to a nested virtualization/openstack issue with devices not being present; not related to cloud-init. If further investigation points to an issue with cloud-init you can move the cloud-init task back to New.

** Changed in: cloud-init (Ubuntu) Status: Incomplete => Invalid

-- You received this bug notification because you are subscribed to the bug report. https://bugs.launchpad.net/bugs/1851552

Title: since ubuntu 18 bionic release and latest, the ubuntu18 cloud image is unable to boot up on openstack instance

Status in cloud-init: New Status in networking-calico: New Status in OpenStack Compute (nova): New Status in OpenStack Community Project: New Status in qemu-kvm: New Status in cloud-init package in Ubuntu: Invalid Status in qemu package in Ubuntu: New

Bug description: Openstack Queens release which is running on ubuntu 18 LTS Controller and Compute. Tried to boot up the instance via horizon dashboard without success. Nova flow works perfect. When access to console I discovered that the boot process stuck in the middle. [[0;1;31m TIME [0m] Timed out waiting for device dev-vdb.device. [[0;1;33mDEPEND[0m] Dependency failed for /mnt. [[0;1;33mDEPEND[0m] Dependency failed for File System Check on /dev/vdb. It receives IP but looks like not get configured at time. since ubuntu 18 there is netplan feature managing the network interfaces please advise.

more details as follow: https://bugs.launchpad.net/networking-calico/+bug/1851548

To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1851552/+subscriptions

ubuntu-server-builder commented 1 year ago

Launchpad user Dan Watkins(oddbloke) wrote on 2020-01-14T18:28:49.166489+00:00

Hi Vasili,

From a cloud-init perspective, there isn't anything we can do so I'm going to move the upstream task to Invalid too. I'm afraid I don't really have any advice on how to proceed, as this appears to be a hypervisor or cloud issue.

Dan

ubuntu-server-builder commented 1 year ago

Launchpad user Vasili(vasili.namatov) wrote on 2020-01-14T18:41:11+00:00

In Rocky release I’m not experiencing kind of issues. And make sure you use kvm and not qemu, cause qemu is limited on its performance and kvm just born to work with latest hardware :)

Best, Vasili

Sent from iPhone

On 14 Jan 2020, at 20:35, Dan Watkins daniel.watkins@canonical.com wrote:

Hi Vasili,

From a cloud-init perspective, there isn't anything we can do so I'm going to move the upstream task to Invalid too. I'm afraid I don't really have any advice on how to proceed, as this appears to be a hypervisor or cloud issue.

Dan

** Changed in: cloud-init Status: New => Invalid

-- You received this bug notification because you are subscribed to the bug report. https://bugs.launchpad.net/bugs/1851552

Title: since ubuntu 18 bionic release and latest, the ubuntu18 cloud image is unable to boot up on openstack instance

Status in cloud-init: Invalid Status in networking-calico: New Status in OpenStack Compute (nova): New Status in OpenStack Community Project: New Status in qemu-kvm: New Status in cloud-init package in Ubuntu: Invalid Status in qemu package in Ubuntu: New

Bug description: Openstack Queens release which is running on ubuntu 18 LTS Controller and Compute. Tried to boot up the instance via horizon dashboard without success. Nova flow works perfect. When access to console I discovered that the boot process stuck in the middle. [[0;1;31m TIME [0m] Timed out waiting for device dev-vdb.device. [[0;1;33mDEPEND[0m] Dependency failed for /mnt. [[0;1;33mDEPEND[0m] Dependency failed for File System Check on /dev/vdb. It receives IP but looks like not get configured at time. since ubuntu 18 there is netplan feature managing the network interfaces please advise.

more details as follow: https://bugs.launchpad.net/networking-calico/+bug/1851548

To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1851552/+subscriptions

ubuntu-server-builder commented 1 year ago

Launchpad user Sylvain Bauza(sylvain-bauza) wrote on 2020-04-22T13:53:29.784751+00:00

I honestly don't see any evidence of some broken behaviour in Nova if, particularly, other instances with other guest image using cloud-init can boot correctly.

Please provide us some logs or better trace of a potential Nova problem in order for us to classify the potential root cause and a possible solution, but in the meantime I'll have to close this bug from the Nova point of view. You can reopen this bug by changing its status to New.

ubuntu-server-builder commented 1 year ago

Launchpad user Nell Jerram(neil-jerram) wrote on 2020-07-06T09:53:27.834649+00:00

I don't believe this is to do with networking-calico, so will mark as Invalid for networking-calico.