canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.88k stars 857 forks source link

Groovy kernel (5.8.0-1004-aws) creates broken /dev/console on i3.metal instances #3789

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1896604

Launchpad details
affected_projects = ['cloud-images', 'linux-aws (Ubuntu)']
assignee = paride
assignee_name = Paride Legovini
date_closed = 2020-10-09T08:25:34.929897+00:00
date_created = 2020-09-22T11:01:35.708051+00:00
date_fix_committed = 2020-10-09T08:25:34.929897+00:00
date_fix_released = 2020-10-09T08:25:34.929897+00:00
id = 1896604
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1896604
milestone = None
owner = paride
owner_name = Paride Legovini
private = False
status = fix_released
submitter = paride
submitter_name = Paride Legovini
tags = []
duplicates = []

Launchpad user Paride Legovini(paride) wrote on 2020-09-22T11:01:35.708051+00:00

[Impact]

Starting with kernel 5.8 the default nr_uarts has been changed from 4 to 2 for amd64, but this seems to affect i3.metal instances in AWS, because ttyS0 is now remapped to ttyS4 and this is breaking tools like cloud-init (and probably something else).

[Test case]

echo > /dev/console

bash: echo: write error: Input/output error

[Fix]

Setting nr_uarts=4 by default (via CONFIG_SERIAL_8250_RUNTIME_UARTS) restores the previous behavior and writing to /dev/console works without returning any error.

[Regression potential]

Minimal. Restores the old behavior used in 5.4 (that shouldn't have changed in the first place).

[Original bug report]

Hi,

When running Groovy daily images on i3.metal instances a broken /dev/console is created. The char device appears to be writable but writing to it causes an Input/output error. This is breaking cloud-init, as it tries to log to /dev/console, and is likely to break other programs.

On Focal:

root@ip-172-31-24-163:~# ls -l /dev/console crw------- 1 root root 5, 1 Sep 21 16:07 /dev/console root@ip-172-31-24-163:~# echo x > /dev/console root@ip-172-31-24-163:~#

On Groovy:

root@ip-172-31-20-184:~# ls -l /dev/console crw--w---- 1 root tty 5, 1 Sep 21 16:03 /dev/console root@ip-172-31-20-184:~# echo x > /dev/console bash: echo: write error: Input/output error

The Groovy kernel log has a

[ 3.561696] fbcon: Taking over console

line in it, which is not present in the Focal kernel log (5.4.0-1024-aws). Perhaps fbcon should be prevented from taking over console?

ubuntu-server-builder commented 1 year ago

Launchpad user Andrea Righi(arighi) wrote on 2020-09-22T17:14:14.179126+00:00

Adding some details about this issue. It looks like the real problem is the serial driver, in fact with a 5.4 kernel we can see the following in dmesg:

[ 4.991325] 0000:16:00.0: ttyS0 at MMIO 0xc5a00000 (irq = 85, base_baud = 115200) is a 16550A

With the 5.8 kernel we don't see any message at all about ttyS0, meaning that the serial isn't properly recognized.

A temporary workaround could be to remove console=ttyS0 from the kernel boot parameters, this would probably make cloud-init happy, but this is not obviously the right solution.

I'll investigate more to find the exact commit that introduced this regression.

Thanks Paride for helping me out to reproduce and test this problem!

ubuntu-server-builder commented 1 year ago

Launchpad user Paride Legovini(paride) wrote on 2020-09-23T12:38:04.593765+00:00

Thanks Andrea for looking into this.

Added a cloud-init task for tracking.

ubuntu-server-builder commented 1 year ago

Launchpad user Andrea Righi(arighi) wrote on 2020-09-24T09:53:45.005092+00:00

The reason of this problem is that in 5.8 the default amount of nr_uarts has been changed from 4 to 32. This is causing ttyS0 to be remapped to ttyS4, breaking the user-space.

The solution is to set back the number of UARTS to 4. I tried to boot the kernel adding 8250.nr_uarts=4 to the kernel boot parameters in GRUB and /dev/console is now working correctly.

I'll send a fix for this to restore the previous behavior by default in the kernel and avoid breaking the user-space.

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2020-10-08T16:10:35.020285+00:00

This bug was fixed in the package linux-aws - 5.8.0-1007.7


linux-aws (5.8.0-1007.7) groovy; urgency=medium