SUSE-Enceladus / azure-li-services

Azure Large Instance Services
GNU General Public License v3.0
7 stars 0 forks source link

grub2 unable to recognize the correct architecture #252

Closed balram2697 closed 4 years ago

balram2697 commented 4 years ago

Hi Marcus,

HW recommended tool is unable to configure system properly as it looks for grubx64.efi file. The contents of boot dir are

ls  /boot/efi/EFI/BOOT/
MokManager.efi  bootx64.efi  grub.cfg  grub.efi

We are missing grubx64.efi file, which should be present as per vendor, given it’s a x64 archictecture. running 'grub2-install', the system correctly creates file.

content of this file "util/grub-install.c" (below) shows that efi file depends on the architecture.
------------------------------
1134   {  
1135     /* It is convenient for each architecture to have a different
1136        efi_file, so that different versions can be installed in parallel.
1137     */
1138     switch (platform)
1139       {
1140       case GRUB_INSTALL_PLATFORM_I386_EFI:
1141         efi_file = "grubia32.efi";
1142         break;
1143       case GRUB_INSTALL_PLATFORM_X86_64_EFI:   <-----
1144         efi_file = "grubx64.efi";              <-----
1145         break;
1146       case GRUB_INSTALL_PLATFORM_IA64_EFI:
1147         efi_file = "grubia64.efi";
1148         break;
1149       case GRUB_INSTALL_PLATFORM_ARM_EFI:
1150         efi_file = "grubarm.efi";
1151         break;
1152       case GRUB_INSTALL_PLATFORM_ARM64_EFI:
1153         efi_file = "grubaa64.efi";
1154         break;
1155       case GRUB_INSTALL_PLATFORM_RISCV32_EFI:
1156         efi_file = "grubriscv32.efi";
1157         break;
1158       case GRUB_INSTALL_PLATFORM_RISCV64_EFI:
1159         efi_file = "grubriscv64.efi";
1160         break;
1161       default:
1162         efi_file = "grub.efi";                <-----
1163         break;
1164       }
1165   }
------------------------------

So seems like grub2 is not able to recognize the correct architecture of the system. Did we somehow miss running grub install because its not required ? Please note running the HW tool is utterly important for us.

balram2697 commented 4 years ago

Hi Marcus,

Do you have any idea on this ?

jaawasth commented 4 years ago

@schaefi this is a problem for us because the HW recommended tool is required for required performance. Can you please look into it ? We ran into issues with the HW with SLES12 SP5 image which exposed this issue.

schaefi commented 4 years ago

Sorry but I cannot confirm this:

grub-2.04/util/grub-install.c

switch (platform)
        {
        case GRUB_INSTALL_PLATFORM_I386_EFI:
          efi_file = "BOOTIA32.EFI";
          break;
        case GRUB_INSTALL_PLATFORM_X86_64_EFI:
          efi_file = "BOOTX64.EFI";
          break;
        case GRUB_INSTALL_PLATFORM_IA64_EFI:
          efi_file = "BOOTIA64.EFI";
          break;
        case GRUB_INSTALL_PLATFORM_ARM_EFI:
          efi_file = "BOOTARM.EFI";
          break;
        case GRUB_INSTALL_PLATFORM_ARM64_EFI:
          efi_file = "BOOTAA64.EFI";
          break;
        case GRUB_INSTALL_PLATFORM_RISCV32_EFI:
          efi_file = "BOOTRISCV32.EFI";
          break;
        case GRUB_INSTALL_PLATFORM_RISCV64_EFI:
          efi_file = "BOOTRISCV64.EFI";
          break;
        default:
          grub_util_error ("%s", _("You've found a bug"));
          break;
        }

As you can see it looks up bootx64.efi as expected. Also a binary string search tells me:

strings /usr/sbin/grub2-install | grep BOOTX64.EFI
BOOTX64.EFI

And the image we provide would not boot if the observed issue would apply at image build time. Are you sure your system is still using a supported version of grub2 ?

My assumption is that grub2 was changed in the system

In any case this is not an issue which should be discussed here. If you see a problem in grub2 please open a bugzilla ticket and discuss this with the people who would actually touch the grub2 code. This can't be done here, as we also just use the grub2 package is comes from the distribution and that one does not expose the issue you mentioned. At least I don't see it

Thanks

balram2697 commented 4 years ago

@schaefi the below lines of code (shared by you) apply only if we are dealing with removable media. While the code lines I shared earlier apply to non-removable media.

------------------------------
1096       efi_distributor = bootloader_id;
1097       if (removable)    <------------------- !!
1098   {
1099     /* The specification makes stricter requirements of removable
1100        devices, in order that only one image can be automatically loaded
1101        from them.  The image must always reside under /EFI/BOOT, and it
1102        must have a specific file name depending on the architecture.
1103     */
1104     efi_distributor = "BOOT";
1105     switch (platform)
1106       {
1107       case GRUB_INSTALL_PLATFORM_I386_EFI:
1108         efi_file = "BOOTIA32.EFI";
1109         break;
1110       case GRUB_INSTALL_PLATFORM_X86_64_EFI:
1111         efi_file = "BOOTX64.EFI";
1112         break;
(...)
------------------------------

Can you please let us know why this is behaving like a removable media?

schaefi commented 4 years ago

Can you please let us know why this is behaving like a removable media?

I don't see where this should come from

Please let us clarify some facts first:

In reference to the above I have tested booting SLES12-SP5-SAP-Azure-VLI-BYOS in a virtual EFI firmware as follows:

qemu-kvm -bios /usr/share/qemu/ovmf-x86_64-ms.bin -hda SLES12-SP5-SAP-Azure-VLI-BYOS.raw -hdb my-azure-lun.raw -serial stdio

The system booted up as expected because we also tested this before we submitted it to you

So can you please explain step by step at which point in your process you see a grub error ?

Thanks

jaawasth commented 4 years ago

@schaefi , the image boots up fine. After we receive image from you, we need to install hardware vendor recommended packages for performance tuning. This is where we are seeing a difference. The vendor software is relying on the name of grubx64.efi being created for correctly tuning kernel params / settings, since we are creating bootx64.efi [which they correctly pointed is for removable media] So the problem is not with booting the image but correct configuration of the system. This problem was exposed in SLES12 SP5 because previous versions of the image [as with all other VLI images] had "rsyslog" missing, this exposed the problem with SLES12 SP5 because on installing the h/w recommended packages the system crashed. After installing rsyslog, the system comes up fine but we deep dived into the issue to check if the system is correctly configured after installing rsyslog, its then we found out that there are several other settings which are missing [which is due to missing grubx64.efi file, which the h/w vendor relies on]. We can manually correct it but that would defeat the purpose of an automated image deployment solution and increase the manual workload.

If you wish we can get over a call and resolve this, if there are issues still i'll involve people from h/w vendor side as well.

schaefi commented 4 years ago

ok now I understand, thanks for the details. So looking at:

makes clear that for removable media the efi image should be named bootx64.efi. Sorry for not seeing this in the C code you pointed out. In the build process the removable flag come into play at shim-install time. The following image build log excerpt shows this:

[  408s] [ DEBUG   ]: 19:23:19 | EXEC: [chroot /tmp/kiwi_mount_manager.2qi238mn shim-install --removable /dev/loop0]

The reason why it's done that way is because the image is a removable media. It's build on a system that is not the later target and can be deployed on any supported set of hardware.

The part I don't understand is, why this causes trouble in your process ? Do you have extra code that explicitly looks up an efi binary ? If so I think that should not be done and instead the OS tools (grub2-mkconfig, grub2-install, shim-install) should be used. Also are you changing the efi binary in some way ? and why ?

jaawasth commented 4 years ago

@schaefi the HW vendors are using this in their scripts to determine whether grub2 is installed [using the efi binary name]. For our case it definitely doesnt work. I'm not sure using OS tools would be a good identifier as well, anyone can install these tools after their os installation ?

The reason why it's done that way is because the image is a removable media. It's build on a system that is not the later target and can be deployed on any supported set of hardware.

I'm not sure how to interpret this, can this still be called a removable media, given we cannot remove this from the system while its installed on the host.

schaefi commented 4 years ago

the HW vendors are using this in their scripts to determine whether grub2 is installed [using the efi binary name]

This check can fail and the presence of the efi binary does not allow inferences if grub2 or any other tool manages it. I would say a test if a component is installed or not is only safe by looking up the package database. In case of SLES this would be something like:

if rpm -q --quiet grub2-x86_64-efi; then
    # grub2 is present
fi

The component checkup is distribution specific. Thus a generic solution needs to take this into account as well.

I'm not sure how to interpret this, can this still be called a removable media,

I'd say this highly depends on how you define the term "removable". The original intend was that the target hardware is a removable media. For example a USB stick or a CD drive. But in a SAN system a hard disk can also be considered a removable media. There is also nobody preventing me from moving my external SSD to another slot. If you are looking at how images are built the media is a loop device which is created/used/deleted and therefore also removable because I can take the image and loop mount it at any other machine.

All this concept and differentiation about removable or not is in my eyes questionable. The only mentioned EFI binary name in the EFI standard paper is bootx64.efi for x86_64 and that's what I think should hardware vendors use too. The OS specific "non-removable" efi binary name has no standard and I'm not sure if any EFI firmware will load with "grubx64.efi"

I'm sorry there is no way for me to change this as it will cause conflicts with other tooling in the distribution which checks for bootx64.efi.

Let me know our alternatives

Thanks

jaawasth commented 4 years ago

@schaefi thanks, that explains it, I have talked to the hw people as well and they will be adding a fix for that.

schaefi commented 4 years ago

Thanks for your effort and patience :+1: