danderson / netboot

Packages and utilities for network booting
Apache License 2.0
1.49k stars 181 forks source link

HPDL380 Gen 9 can't boot in UEFI mode #43

Open wrouesnel opened 7 years ago

wrouesnel commented 7 years ago

I've been trying to boot some HP DL380 Gen9's which default to UEFI PXE boot mode. Pixiecore sees the request just fine and sends a ProxyDHCP packet, but the machines don't seem to respond to it (in that they fail to boot, and instead send another DHCP request).

Anyone had any experience with these units? It does occur to me that I am using the dhcp-no-bind mode of pixiecore due to resource constraints, and they might be picky about where the responses come from, but if I shunt them into Legacy PXE mode they boot immediately and properly.

As an attempted work around I tried enabling pixiecore via a hack to send Option 43, but it made no difference. Wireshark captures of the traffic look correct - though I need to go back and check for anything on other ports (I should confirm the port 4011 request is absent - I certainly haven't seen any traffic that way.)

This is all taking place on a single layer 2 network.

danderson commented 7 years ago

Does the network have another DHCP server offering network configuration? In a packet capture, you should see both a normal DHCPOFFER with network configuration, and a ProxyDHCP response from Pixiecore with boot information. Is that what you see?

PXE and ProxyDHCP over UEFI are not specified anywhere that I can find, so I'm going on the assumption that it behaves similarly to PXE in BIOS environments. So far, this is proving to be untrue for some servers firmwares :(.

Unfortunately, at this point, the only way for me to debug this is if I can get access to an HP DL380 G9 with ILO, on a lab network, so I can experimentally probe the firmware to see what it likes/doesn't like. Would you happen to have a lab environment that you could let me borrow? Sadly this project is not an official Google project, so I don't have $$ to get my own G9, and it's quite expensive for a single throwaway use :).

wrouesnel commented 7 years ago

Heh - no specific expectations! Just was hoping someone might've seen this before and if not that I can figure it out and contribute back.

wrouesnel commented 7 years ago

So the short answer seems to be: these machines ignore proxyDHCP offers entirely. I tried manually coding a next-server config pointing t the iPXE endpoints of pixiecore, and suddenly they started pulling down their boot images.

So it looks like a fix here basically depends on pixiecore gaining DHCP server capability (or someone finding out some magic DHCP option to make them accept a proxyDHCP config).

EDIT: I suppose another option would be to serve a very generic bootloader back on some filename which gets iPXE up enough to load the specific script.

wrouesnel commented 7 years ago

For for completeness, a workaround - if you control your DHCP server - I just had success using this dnsmasq.conf file to boot via pixiecore by sending an otherwise fake MAC address:

bind-interfaces

interface=tap0

dhcp-range=tap0,22.0.0.10,22.0.1.254,255.255.254.0,infinite

dhcp-match=set:ipxe,175 # iPXE sends a 175 option.
dhcp-match=set:efi-x86_64,option:client-arch,7

dhcp-boot=tag:efi-x86_64,tag:!ipxe,aa:aa:aa:aa:aa:aa/7,22.0.0.1,22.0.0.1

log-dhcp
log-queries

I do not know what to do about this insanity, but that finally resolves the mystery.

dholt commented 6 years ago

I'm running into this same issue with UEFI clients. I've tried setting options 66/67 in my isc-dhcp-server config (which I think is what's happening in @wrouesnel's dnsmasq example) but still no luck. Not sure if any other workarounds or updates have come up since September?

danderson commented 6 years ago

No changes on my end, the status is still: to debug this further, I would need access to one of these machines so that I can quickly iterate on a bunch of hypotheses. Failing that, all I can say is:

I know it's an unsatisfying answer, and I wish I had something better to offer... But I can't justify spending $2k of my own money, on a server I don't need, to diagnose someone else's buggy firmware. So the only option I have left is to hope that someone will eventually be willing to loan me access to one of these boxes, and that I'm able to find a useful workaround.

dholt commented 6 years ago

Thanks Dave- unfortunately I'm stuck with UEFI due to install media. I'm running into this on brand-new Supermicro and Quanta systems so it's not isolated to HP. I can give you access to a few different machines like this, hit me up if you're interested - dholt at nvidia.com

danderson commented 6 years ago

Thanks, I'll get in touch. I'm particularly intrigued by the Supermicro systems, because I know for a fact that a friend is booting brand new Supermicros with Pixiecore just fine... So I'm wondering what's going on.

Shados commented 1 year ago

I picked up a refurbished HP T630 on the cheap, and ran into this same behaviour there: I can PXE boot using pixiecore just fine via legacy boot, but UEFI PXE booting instead appears to go through the initial request/response OK, doesn't continue, sends another request, then gives up.

This is rather unfortunate as I'd intended to use this as a testbed for some secure boot + PXE stuff. I am also using dnmasq (on a separate server to the one running pixiecore), but had no luck with @wrouesnel's workaround.

I might resort to signing an iPXE EFI binary, putting that on a USB stick, plugging that into the T630's internal USB port, and booting from that. Would at least let me avoid trying to figure out if I can burn iPXE onto the NIC...