CCI-MOC / esi

Elastic Secure Infrastructure project
6 stars 12 forks source link

Figure out how to enable uefi #554

Closed tzumainn closed 3 weeks ago

tzumainn commented 1 month ago

Looks like gpu nodes may require uefi

I've tried the instructions at https://docs.openstack.org/ironic/wallaby/install/configure-pxe.html#uefi-pxe-grub-setup with no success so far

tzumainn commented 1 month ago

@larsks @naved001 @hakasapl I'm listing out the solution I found for feedback; based on that I'll create new issues for more permanent solutions.

So it turns out there were three separate issues that needed to be solved. I'm not 100% sure on the solution to any of these, so I'm listing them out here:

a) Neutron DHCP

The installed version of dnsmasq.conf only handled legacy boot:

enable-tftp
tftp-root=/var/lib/neutron/dhcp
dhcp-boot=pxelinux.0
dhcp-userclass=set:ENH,iPXE

I updated it to the following to support both legacy and UEFI boot; testing shows that both indeed work:

enable-tftp
tftp-root=/var/lib/neutron/dhcp
dhcp-match=ipxe,175
dhcp-match=set:efi,option:client-arch,7
dhcp-match=set:efi,option:client-arch,9
dhcp-match=set:efi,option:client-arch,11
dhcp-boot=tag:ipxe,pxelinux.0
dhcp-boot=tag:efi,bootx64.efi
dhcp-userclass=set:ENH,iPXE

b) UEFI Boot Image

I ran into a variety of issues when trying to find an .efi boot image that would do what we needed. Legacy boot will load pxelinux.0 and then find pxelinux.cfg/default and run our custom ipxe boot script there. The UEFI images did not do that. After reading some documentation, I finally compiled a custom .efi image that included a custom ipxe boot script that does what our legacy ipxe boot script does. Cleaning worked after that.

c) Image UEFI Compatibility

It turns out that some images aren't compatible with both legacy and UEFI boot modes; for example, our centos9-stream image only works with legacy mode, but out ubuntu image works with both.

As of now, followup tasks include:

Let me know what you think!

tzumainn commented 1 month ago

@hpdempsey here's an update on the issue since I know you were concerned about it.

larsks commented 1 month ago

a) Neutron DHCP

@tzumainn the dnsmasq configuration looks correct.

b) UEFI Boot Image

I finally compiled a custom .efi image that included a custom ipxe boot script that does what our legacy ipxe boot script does.

Do you have a link to that process?

c) Image UEFI Compatibility

Confirm which images work and which do not; figure out if we want to generate new images for those that do not.

I think we probably want to ensure that our images support UEFI boot going forward; all new hardware is going to use UEFI by default.

tzumainn commented 1 month ago

b) UEFI Boot Image I finally compiled a custom .efi image that included a custom ipxe boot script that does what our legacy ipxe boot script does.

Do you have a link to that process?

Ah, yep - https://ipxe.org/howto/chainloading#breaking_the_loop_with_an_embedded_script

tzumainn commented 1 month ago

@larsks to politely ask an Ironic node to boot in UEFI mode, run the following:

openstack baremetal node show <node> | grep properties
# copy the capabilities string
openstack baremetal node set --property capabilities="boot_mode:uefi,<capabilities string>" <node>

If you run openstack baremetal node show MOC-R4PAC04U33 | grep capabilities you'll see what the string ends up looking like.

tzumainn commented 3 weeks ago

This solution seems to be fine; created follow up issues: