AlmaLinux / cloud-images

Packer templates and other tools for building AlmaLinux images for various cloud platforms.
MIT License
161 stars 48 forks source link

Cannot updated kernel to latest version on AWS instances #154

Closed sebanism closed 1 year ago

sebanism commented 1 year ago

We're in the process of considering migrating a couple of thousand servers to AlmaLinux. We can see in the Release Notes that kernel version 5.14.0-284.11.1 is part of the 9.2 release.

During testing on instances provisioned in AWS, on the newly provisioned instances (both Community edition and the official AWS Marketplace version, trying to deploy on both T2 and C6i instances), after running yum update -y, we can see the latest kernel, 5.14.0-284.18.1.el9_2, is being downloaded, however, when we try to update to 5.14.0-284.18.1.el9_2 (yum install kernel -y), the new kernel gets installed, but after reboot the older kernel still persists (newer kernel is not being applied). The only successful attempt was using Community edition (v9.2) on a T2 instance, but our target is to deploy the Alma 9.2 Official AWS Marketplace AMI on C6i instances.

Is this expected? What is the expected behaviour?

In future, if there are security vulnerabilities and kernel patches need to be applied, are we going to have issues with AlmaLinux? With CentOS we typically received new kernel updates every quarter, can we expect the same from AlmaLinux?

knkinfy commented 1 year ago

we are facing a very similar issue, we are using ami from the community marketplace (AWS) and now we are forced to consider rocky Linux!

sukanyamanian commented 1 year ago

Experiencing the same!

itsravikumars commented 1 year ago

I too Facing the same issue.

LKHN commented 1 year ago

Hi, thanks for the reporting and confirming the issue. I also came across the same issues 2 days ago and started to find the root of the issue, will update this issue with my findings very soon.

sebanism commented 1 year ago

Hi, thanks for the reporting and confirming the issue. I also came across the same issues 2 days ago and started to find the root of the issue, will update this issue with my findings very soon.

Thanks heaps, Elkhan - really appreciate the quick reply, and good to know it's been noticed. If we can help in any way, let me know - happy to test out if necessary. Just so you know, this could be a key blocker for us, as we're making a decision on Monday whether to proceed with Alma if we can't get this resolved by then. It'll be important to understand if this is just a one-off, or if it could be a risk in future. When our security team identifies new kernels are available, we are given a very small window to update within.

LKHN commented 1 year ago

Summary:

Due the misconfiguration on /etc/sysconfig/kernel and /boot/grub2/grub.cfg files the system doesn't automatically set the latest kernel as default for the next boot.

Affected Amazon Machine Images:

Fixed on:

Community AMI: https://wiki.almalinux.org/cloud/AWS.html#community-amis

AWS Marketplace: You get the notification email once it's available.

Manual Fix:

Please run the steps below on the EC2 instances which were created from the AlmaLinux OS 9.2.20230513 x86_64 AMI:

Shell:

printf 'GRUB_DEFAULT=saved\n' >> /etc/default/grub
printf 'DEFAULTKERNEL=kernel\nUPDATEDEFAULT=yes\n' > /etc/sysconfig/kernel
grub2-mkconfig -o /boot/grub2/grub.cfg

If you haven't installed the latest kernel

dnf -y upgrade kernel*

If you alreay installed the latest kernel and it's not default

latest_kernel_ver=$(dnf --refresh rq --latest-limit=1 --qf '%{version}-%{release}' kernel)
dnf -y reinstall kernel*-"$latest_kernel_ver"

Ansible (for multiple EC2 Instances):

Attached the playbook file as tar.gz since GitHub doesn't support YAML format.

$INVENTORY: Ansible inventory file in INI or YAML format

$SSH_PRIV_KEY: Private SSH key

$USER: SSH user to connect, default cloud user is ec2-user

ansible-playbook -i inventory.yaml --private-key $SSH_PRIV_KEY_PATH -u $USER default_kernel_bugfix.yaml

How to verify if it's fixed:

List all boot entries:

grubby --info=ALL

Make sure the entry of the latest kernel is default:

grubby --info=DEFAULT

After the reboot make sure that you booted to the latest kernel which default

uname -a

Detailed explanation:

When you do the installation from the ISO, the Anaconda installer configures the bootloader including the /etc/default/grub and /etc/sysconfig/kernel files for you as part of the installation process. When we started to build AlmaLinux OS 9.0 Beta 1 AMIs, due the limitation of AWS VM Import service, importing VM images with GPT partition table wasn't possible. Therefore we decided to build the AMI inside AWS with chroot like approach. Which means an installation without the Anaconda installer and installation and configuration needed to be done manually. This configuration was working fine until the release of RHEL 9.2 / AlmaLinux OS 9.2.

So, trigger of this problem comes from some changes on the upstream. If you upgrade the kernel on the latest RHEL AMI you can observe the latest kernel doesn't take the index number 0 hence it's listed as a second option on GRUB menu (index=1).

We fixed the issue with:

How we tested it:

On the fixed version AMI we installed test build of next kernel version (5.14.0-284.25.1.el9_2)

dnf config-manager --add-repo https://build.almalinux.org/pulp/content/builds/AlmaLinux-9-x86_64-7117-br/config.repo
dnf -y upgrade

default_kernel_bugfix.tar.gz